Document Type: Original Research Paper

Authors

1 Nima Institute, Mahmoodabad, Mazandaran, Iran.

2 Department of Biomedical Informatics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran Iran

Abstract

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertain data, or use the sliding window model to assess data streams. Sliding window model uses a fixed-size window to only maintain the most recently inserted data and ignores all previous data (or those that are out of its window). Many real-world applications however require maintaining all inserted or obtained data. Therefore, the question arises that whether other window models can be used to find frequent patterns in dynamic streams of uncertain data.
In this paper, we used landmark window model and time-fading model to answer that question. The method presented in the form of proposed algorithm, which uses the idea of landmark window model to find frequent patterns in the relational and uncertain data streams, shows a better performance in finding functional dependencies than other methods in this field. Another advantage of this method compared with other methods is that it shows tuples that do not follow a single dependency. This feature can be used to detect inconsistent data in a data set.

Keywords

Main Subjects

[1] Mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB 1994), Santiago de Chile, Chile, pages 487{499. Morgan Kaufmann Publishers Inc., 1994.
[2] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. SIGMOD Rec., 29:1{12, May 2000.
[3] Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, and Philip S. Yu. Mining frequent patterns in data streams at multiple time granularities. Data Mining: Next Generation Challenges and Future Directions, pages 191{212, 2004.
[4] Carson Kai-Sang Leung and Boyu Hao. Mining of frequent itemsets from streams of uncertain data. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering (ICDE 2009), Shanghai, China, pages 1663{1670. IEEE Computer Society, 2009.
[5] Carson Kai-Sang Leung, Christopher L. Carmichael, and Boyu Hao. Efficient mining of frequent patterns from uncertain data. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops (ICDM 2007), Washington, DC, USA, pages 489{494. IEEE Computer Society, 2007.
[6] Chun-Kit Chui, Ben Kao, and Edward Hung. Mining frequent itemsets from uncertain data. In Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2007), Nanjing, China, pages 47{58. Springer-Verlag, 2007.
[7] Carson Kai-Sang Leung, Mark Anthony F. Mateo, and Dale A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In Proceedings of the 12th Paci_c-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2008), Osaka, Japan, pages 653{661. Springer-Verlag, 2008.
[8] Man Lung Yiu, Nikos Mamoulis, Xiangyuan Dai, Yufei Tao, and Michail Vaitis. E_cient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE Transactions on Knowledge and Data Engineering, 21:108{122, January 2009.
[9] Xiangyuan Dai, Man Lung Yiu, Nikos Mamoulis, and Michail Vaitis. Probabilistic spatial queries on existentially uncertain data. In Proceedings of the 9th International Symposium on Spatial and Temporal Databases (SSTD 2005), Angra dos Reis, Brazil, pages 400{417.Springer-Verlag, 2005.
[10] Chun-Kit Chui and Ben Kao. A decremental approach for mining frequent itemsets from uncertain data. In Proceedings of the 12th Pacific- Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2008), Osaka, Japan, pages 64{75. Springer-Verlag, 2008.
[11] George A. Mihaila, Ioana Stanoi, and Christian A. Lang. Anomaly-free incremental output in stream processing. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), Napa Valley, California, USA, pages 359{368. ACM, 2008.
[12] Anamika Gupta, Vasudha Bhatnagar, and Naveen Kumar. Mining closed itemsets in data stream using formal concept analysis. In Proceedings of the 12th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2010), Bilbao, Spain, pages 285{296. Springer- Verlag, 2010.
[13] Nan Jiang and Le Gruenwald. Research issues in data stream association rule mining. SIGMOD Rec., 35:14{19, March 2006.
[14] Carson Kai-Sang Leung and Boyu Hao. Mining of frequent itemsets from streams of uncertain data. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering (ICDE 2009), Shanghai, China, pages 1663{1670. IEEE Computer Society, 2009.
[15] Mahmood Deypir, Mohammad Hadi Sadreddini, An Efficient Algorithm for Mining Frequent Itemsets Within Large Windows Over Data Streams, International Journal of Data Engineering (IJDE), Volume (2) : Issue (3) : 2011.
[16] Li H, Lee S, Shan M. “An efficient algorithm for mining frequent itemsets over the entire history of data streams”. Proceedings of the first international workshop on konwledge discovery in data streams, Pisa, Italy, 2004.
[17] Carson Kai-Sang Leung and Fan Jiang, Frequent Pattern Mining from Time-Fading Streams of Uncertain Data, LNCS 6862, pp. 252–264, 2011.
[18] Mannila, H. and R¨aih¨a, K.-J. (1991).The design of relational databases. Addison-Wesley.
[19] Brockhausen, P. (1994). Discovery of functional and unary inclusion dependencies in relational databases. Master’s thesis, University Dortmund, Informatik VIII. In German.
[20] Anforderungen An , Lehrstuhl Viii , Lehrstuhl Viii , Siegfried Bell , Siegfried Bell Peter Brockhausen , Peter Brockhausen , Fachbereich Informatik , Fachbereich Informatik, Fachbereich Informatik , Fachbereich Informatik , Fachbereich Informatik, Discovery of Data Dependencies in Relational Databases, Machine Learning: ECML-95: 8th European Conference on Machine Learning, Volume 8, 1995.
[21] P. Bosc, L. Lietard, and O. Pivert, Functional dependencies revisited under graduality and imprecision, pp. 57 - 62, NAFIPS, 1997.
[22] F. Berzal, I.Blanco, D. Sanchez, J.M. Serrano and M.A. Vila, A definition for fuzzy approximate dependencies, Fuzzy Sets and Systems, Vol. 149, No. 2, pp. 105 – 129, 2005.
[23] J. Kivinen and H. Mannila, Approximate inference of functional dependencies from relations, Theoretical Computer Science, Vol. 149, No. 1, pp. 129 – 149, 1995.
[24] J.M. Morrissey, Imprecise information and uncertainty in information systems, ACM Transactions on Information Systems, Vol.8, No. 2, pp. 159–180, 1990.
[25] U. Nambiar, and S. Kambhampati, Answering Imprecise Queries over Autonomous Web Databases, In: Proc. ICDE 2006, 22nd International Conference on Data Engineering, 2006.
[26] S.L. Wang, J.W. Shen, and T.P. Hong, Discovering functional dependencies Incrementally from Fuzzy Relational Databases, in: Proc. English National Conference on Fuzzy Theory and Its Applications, pp. 17 – 24, 2000.
[27] S.L. Wang, J.S. Tsai, B.C. Chang, Mining Approximate Dependencies using partitions on Similarity-Relation based Fuzzy databases, in: Proc. IEEE SMC'99, Vol. 6, PP. 871_875, 1999.
[28] S.L. Wang, J.S. Tsai and T.P. Hong, Discovering functional dependencies from Similarity-based Fuzzy Relational Databases, Intelligent Data Analysis, Vol. 5, No. 2, pp. 131 – 149, 2001.
[29] P. A. Flach and I. Savnik, Database dependency discovery: a machine learning approach, AI communications, Vol. 12, No. 3, pp. 139 – 160, 1999.
[30] H. Mannila and K.J. Raiha. Algorithms for inferring functional dependencies from relations. DKE, Vol. 12, No. 1, pp. 83-99, 1994.
[31] S. Lopes, J.M. Petit, L. Lakhal, Efficient Discovery of Functional Dependencies and Armstrong Relations, in: Proc. ICDT 2000, the 7th International Conference on Extending Database Technology: Advances in Database Technology, Vol. 1777, pp. 350 – 364, 2000.
[32] C. Wyss, C. Giannella, and E. Robertson, FastFDs, A Heuristic-Driven Depth-First Algorithm for Mining Functional Dependencies from Relation Instances, Data Warehousing and Knowledge Discovery, Vol. 2114, No. 1, pp. 101 – 123, 2001.
[33] Y. Huhtala, J. Karkkainen, P. Porkka and H. Toivonen, TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. The Computer Journal. Vol. 42, No. 2, pp. 100 – 111, 1999.
[34] Xuan Hong Dang, Kok-Leong Ong, and Vincent Lee, An Adaptive Algorithm for Finding Frequent Sets in Landmark Windows, Springer-Verlag Berlin Heidelberg, pp. 590–597, 2012.
[35] S.M. Fakhr Ahmad, M. Zolghadri Jahromi, M.H. Sadreddini, A new incremental method for discovery of minimal approximate dependencies using logical operations, Journal of Intelligent Data Analysis, Volume 12 pages.607-619, 2008.