Distributed Data Mining based on Random Projection with Optimal Communication
T.Revathi1, P.Sumathi2
1T.Revathi, Research Scholar, Manonmaniam Sundaranar University & Assistant Professor, Dept. of Computer Science , PSG College of Arts & Science, Coimbatore, India.
2P.Sumathi,Assistant Professor ,PG & Research Department, Dept. of Computer Science & Applications, Govt.College of Arts & Science, Coimbatore,India
Manuscript received on January 01, 2013. | Revised Manuscript received on January 02, 2013. | Manuscript published on January 05, 2013. | PP: 246-251 | Volume-2, Issue-6, January 2013. | Retrieval Number: F1167112612/2013©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Distributed data mining discovers hidden useful information from data sources distributed among several sites. Privacy of participating sites becomes great concern and sensitive information pertaining to the individual sites needs high protection when data mining occurs among several sites. Different approaches for mining data securely in a distributed environment have been proposed but in the existing approaches, collusion among the participating sites may reveal sensitive information about other participating sites and they suffer from the intended purposes of maintaining privacy of the individual participating sites, reducing computational complexity and minimizing communication overhead. The proposed method finds global frequent itemsets in a distributed environment with minimal communication among sites and ensures higher degree of privacy with randomized site selection. The experimental analysis shows that proposed method generates global frequent itemsets among colluded sites without affecting mining performance and confirms optimal communication among sites.
Keywords: Distributed data mining, privacy, secure multiparty computation, frequent itemsets.