Extracting Semantic Structure of Web Pages Using Graph Grammar Induction Algorithm
B.Venkatesh1, P.Prakash2
1B.Venkatesh, Department of Computer Science and Engineering, K.S.R.College of Engineering, Thiruchengode, India.
2P.Prakash, Department of Computer Science and Engineering, K.S.R. College of Engineering, Thiruchengode, India.
Manuscript received on March 01, 2014. | Revised Manuscript received on March 04, 2014. | Manuscript published on March 05, 2014. | PP: 203-207 | Volume-4 Issue-1, March 2014. | Retrieval Number: A2162034114/2014©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: With the appearance of the web, it’s fascinating to interpret and extract helpful data from the net. One major challenge in internet interface interpretation is to get the semantic structure underlying an internet interface. Several heuristic approaches have been developed to get and cluster semantically related interface objects. However, those approaches cannot solve the problem of non similarity satisfactorily and don’t seem to be ready to tag the participant role of every object. Distinct from existing approaches, this paper develops a sturdy and formal approach to ill interface semantics mistreatment graph grammars induction. Due to the distinct capability of spatial specifications within the abstract syntax, the spatial graph grammar induction algorithm (SGGI) is chosen to perform the semantic grouping and interpretation of divided screen objects. Instead of analyzing HTML supply codes, we tend to apply an economical image processing technology to acknowledge atomic interface objects from the screenshot of an interface and manufacture a spatial graph, which records vital spatial relations among recognized objects. A spatial graph is a lot of taciturn than its corresponding document object model structure and, thus, facilitates interface analysis and interpretation. Supported the spatial graph, the SGGI parser recovers the graded relations among interface objects.
Keywords: Content extraction, Image Segmentation, Graph Grammar Induction Algorithm, Spatial Parsing.