Web Spam Detection Using Different Features
Sumit Sahu1, Bharti Dongre2, Rajesh Vadhwani3
1Sumit Sahu, Computer Science and Engineering , MANIT ,Bhopal, India.
2Bharti Dongre, Computer Science and Engineering , MANIT, Bhopal, India.
3Rajesh Vadhwani, Computer Science and Engineering , MANIT, Bhopal, India.
Manuscript received on June 18, 2011. | Revised Manuscript received on June 28, 2011. | Manuscript published on July 05, 2011. | PP: 70-73 | Volume-1 Issue-3, July 2011. | Retrieval Number: C062061311
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Spamming is any deliberate action solely in order to boost a web page’s position in search engine results, incommensurate with page’s real value. Web Spam is the Web pages that are the result of spamming. Web spam is the deliberate manipulation of search engine indexes. It is one of the search engine optimization methods. Implementing web spam on a search engine reduces the redundant and non-desirable results. In our paper we discuss the features which are responsible for web page ranking. We also discuss the results of the different classification techniques on our dataset which we process from the WEBSPAM-UK2006 Dataset. We are also proposing a feature which will help in the web spam detection.
Keywords: Web spam; feature selection; classification technique; N gram algorithm..