Semantic Document Classification using Lexical Chaining & Fuzzy Approach
Upasana Pandey1, S. Chakraverty2, Bhawna Juneja3, Ashima Arora4, Pratishtha Jain5
1Upasana Pandey, Computer Engineering, Netaji Subhas Institute of Technology, Delhi University, New Delhi, India.
2S. Chakraverty, Computer Engineering, Netaji Subhas Institute of Technology, Delhi University, New Delhi, India.
3Bhawna Juneja, Information Technology, Netaji Subhas Institute of Technology, Delhi University, New Delhi, India.
4Ashima Arora, Information Technology, Netaji Subhas Institute of Technology, Delhi University, New Delhi, India.
5Pratishtha Jain, Information Technology, Netaji Subhas Institute of Technology, Delhi University, New Delhi, India.
Manuscript received on October 13, 2011. | Revised Manuscript received on October 24, 2011. | Manuscript published on November 05, 2011. | PP: 367-371 | Volume-1 Issue-5, November 2011. | Retrieval Number: E0245101511/2011©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: We propose a novel approach to classify documents into different categories using lexical chaining. In this paper we present a text categorization technique that extracts lexical features of words occurring in a document. Two kinds of lexical chains based on the WordNet and Wikipedia reference sources are created using the semantic neighborhood of tokens. The strength of each lexical chain is determined with the help of TF/IDF, category keyword strength and relative position of tokens in the document. Each category is assigned a weight depending upon the value obtained after the lexical chain computation. Fuzzy logic is incorporated to generate a range for each category using a triangular membership function. The document belongs to the category which satisfies the range criteria. Lexical chaining has large applicability in automated email spam filtering, topic spotting, email routing.
Keywords: Lexical Chaining, TF-IDF, Wikipedia, WordNet.