Computer Networks and Distributed Systems
Azam Seilsepour; Reza Ravanmehr; Hamid Reza Sima
Volume 5, Issue 3 , August 2019, , Pages 143-160
Abstract
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In ...
Read More
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for data mining research to discover facts, trends, events, and even predictions of some incidents. In this paper, a new framework for clustering and extraction of information is presented to analyze the sentiments from the big data. The proposed method is based on the keywords and the polarity determination which employs seven emotional signal groups. The dataset used is 2077610 tweets in both English and Persian. We utilize the Hive tool in the Hadoop environment to cluster the data, and the Wordnet and SentiWordnet 3.0 tools to analyze the sentiments of fans of Iranian athletes. The results of the 2016 Olympic and Paralympic events in a one-month period show a high degree of precision and recall of this approach compared to other keyword-based methods for sentiment analysis. Moreover, utilizing the big data processing tools such as Hive and Pig shows that these tools have a shorter response time than the traditional data processing methods for pre-processing, classifications and sentiment analysis of collected tweets.
Computer Networks and Distributed Systems
Avishan Sharafi; Ali Rezaee
Volume 2, Issue 4 , November 2016, , Pages 17-30
Abstract
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a ...
Read More
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop doesn’t consider load state of each node in distribution input data blocks, which may cause inappropriate overhead and reduce Hadoop performance, but in practice, such data placement policy can noticeably reduce MapReduce performance and may increase extra energy dissipation in heterogeneous environments. This paper proposes a resource aware adaptive dynamic data placement algorithm (ADDP) .With ADDP algorithm, we can resolve the unbalanced node workload problem based on node load status. The proposed method can dynamically adapt and balance data stored on each node based on node load status in a heterogeneous Hadoop cluster. Experimental results show that data transfer overhead decreases in comparison with DDP and traditional Hadoop algorithms. Moreover, the proposed method can decrease the execution time and improve the system’s throughput by increasing resource utilization