Duplicity Detection System for Digital Documents
Ranjeet Singh1, Chiranjit Dutta2
1Mr. Ranjeet Singh, Faculty of information Technology, SRM University, NCR campus, India.
2Mr. Chiranjit Dutta, Faculty of information Technology, SRM University, NCR campus, India.
Manuscript received on November 01, 2012. | Revised Manuscript received on November 02, 2012. | Manuscript published on November 05, 2012. | PP: 24-28 | Volume-2 Issue-5, November 2012. | Retrieval Number: E0984092512/2012©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Plagiarism detection is a challenging problem. Today thousands of documents are present on the net but there are no proper tools to guarantee their uniqueness in such a great domain. PDF documents form a significant portion of this vast database. Copy detection in digital document database may provide necessary guarantees for publishers and newsfeed services to offer their valuable work for others perusal. We consider the case of comparing a Query Document with a Registered Document .Plagiarism detection techniques are applied by making a distinction between natural and programming language. In this paper we have implemented SCAM (standard Copy Analysis Mechanism) which is relative measure to detecting copies based on comparing the words and lines frequency occurrences of the new document against those of registered documents. These tests involve comparisons of various articles and show that in general this scheme performs pretty well in detecting documents that have Exact, Partial and Trivial overlap.
Keywords: Plagiarism, SCAM, WordNet, Registered Document, Query Document