Document Type: Original Research Paper

Authors

1 Department of Computer Engineering, Central Tehran Branch, Islamic Azad University

2 Computer Engineering Department, Central Tehran Branch, Islamic Azad University,

Abstract

Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for data mining research to discover facts, trends, events, and even predictions of some incidents. In this paper, a new framework for clustering and extraction of information is presented to analyze the sentiments from the big data. The proposed method is based on the keywords and the polarity determination which employs seven emotional signal groups. The dataset used is 2077610 tweets in both English and Persian. We utilize the Hive tool in the Hadoop environment to cluster the data, and the Wordnet and SentiWordnet 3.0 tools to analyze the sentiments of fans of Iranian athletes. The results of the 2016 Olympic and Paralympic events in a one-month period show a high degree of precision and recall of this approach compared to other keyword-based methods for sentiment analysis. Moreover, utilizing the big data processing tools such as Hive and Pig shows that these tools have a shorter response time than the traditional data processing methods for pre-processing, classifications and sentiment analysis of collected tweets.

Keywords

Main Subjects

[1] López, V., Del Río, S., Benítez, J.M. and Herrera, F., 2015. Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets and Systems, 258, pp.5-38.
[2] Chen, C.P. and Zhang, C.Y., 2014. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information sciences, 275, pp.314-347.
[3] Pang, B. and Lee, L., 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), pp.1-135.
[4] Birjali, M., Beni-Hssane, A. and Erritali, M., 2017. Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Procedia Computer Science, 113, pp.65-72.
[5] Öztürk, N. and Ayvaz, S., 2018. Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis. Telematics and Informatics, 35(1), pp.136-147.
[6] Pandey, A.C., Rajpoot, D.S. and Saraswat, M., 2017. Twitter sentiment analysis using hybrid cuckoo search method. Information Processing & Management, 53(4), pp.764-779.
[7] Xiong, S., Lv, H., Zhao, W. and Ji, D., 2018. Towards Twitter sentiment classification by multi-level sentiment-enriched word embeddings. Neurocomputing, 275, pp.2459-2466.
[8] Morente-Molinera, J.A., Kou, G., Peng, Y., Torres-Albero, C. and Herrera-Viedma, E., 2018. Analysing discussions in social networks using group decision making methods and sentiment analysis. Information Sciences, 447, pp.157-168.
[9] Araque, O., Corcuera-Platas, I., Sanchez-Rada, J.F. and Iglesias, C.A., 2017. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, pp.236-246.
[10] Howells, K. and Ertugan, A., 2017. Applying fuzzy logic for sentiment analysis of social media network data in marketing. Procedia computer science, 120, pp.664-670.
[11] Wang, X., Zhang, C., Ji, Y., Sun, L., Wu, L. and Bao, Z., 2013, April. A depression detection model based on sentiment analysis in micro-blog social network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 201-213). Springer, Berlin, Heidelberg.
[12] Yu, Y. and Wang, X., 2015. World Cup 2014 in the Twitter World: A big data analysis of sentiments in US sports fans’ tweets. Computers in Human Behavior, 48, pp.392-400.
[13] https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_652, Last accessed on April 2019.
[14] Pak, A. and Paroubek, P., 2010, May. Twitter as a corpus for sentiment analysis and opinion mining. In LREc, Vol. 10, No. 2010, pp. 1320-1326.