Influence Analysis of Emotional Behaviors and user relationships based on twitter data using bigdata


The Goods and Services Tax (GST) has revolutionized the Indian taxation system. This creates a big change in the financial standards of India. Twitter is the ninth largest social networking website in the world, only because of people can share information by way of the short message up to 140 characters called tweets. Twitter is the best source for the sentiment and opinion analysis. The tweets are classified as positive or negative or neutral based on the sentiments. This analysis can be done by classifying the dataset using various Machine Learning Algorithms. One option to perform sentiment analysis in hadoop is to calculate a sentiment score for each tweet. This paper presents the sentiment analysis on the current tweets related to GST. 


Many studies on extracting emotion from a natural language have been conducted in the past few years. The mainstream subject is to classify emotions and improve accuracy The classification method with positive and negative keyword matching has been used in many studies. Some of these studies also categorized multi-dimensional emotions. The present study focuses on an approach using positive and negative keyword matching.
They calculated the average emotion score for each user using machine learning, and defined 25% of the users with a higher score as optimistic users and 25% with a lower score as pessimistic users. They also reported that optimistic users tend to have more social relationships, but tweet less than pessimistic users.
Almost all of the abovementioned studies were conducted based on the user’s own characteristics, such as the behaviour or community formation of positive/negative users. Only a few focused on the reaction and influence caused by emotional expressions. Although some studies analyzed the generative process of emotional tweets, only a few investigated the user relationships influenced by emotions presented that optimistic users have strong social ties. Their claim was based on the existence of many optimists in politician followers. However, they as well as many other existing works did not analyze the relationship among ordinary users.


1. Here accuracy calculation for demonization goes less because of  lack in dataset calculation
2. Storing of data are done via sql so storing and retrieval is slow

The present study focuses on analyzing whether some correlation exists between emotional behaviours and user relationships.


First, we set conditions, collect users, and select sample users by random sampling. The tweets of these selected users become the Twitter corpus.
Next, we attach an emotion score for each tweet and define the tweet with a word scored by two dictionaries as the emotional tweet. Several methods can be used to provide emotion scores. We apply a simple method herein and make a keyword matching between an emotional word dictionary and the morphologically analyzed tweets. The emotional word dictionary has an emotion score for each word, and we calculate the sum as the emotion score of the tweet. We do not attach the emotion score to the tweet if the tweet has no emotional words.
We then calculate the average emotion score for each user. This score is calculated by averaging the emotion scores of randomly selected tweets or all emotional tweets per user. The average emotion score indicates the emotional trend of the user.
After that, we define two groups, namely the positive (P-Group) and negative (N-Group) groups. We then sort users in a descending order by their average emotion scores and define the top 25% of users as the P-Group and the lower 25% as the N-Group
Finally, we apply the statistical test between the positive and negative groups, and investigate the influence of emotional behaviours on the user relationships on Twitter.


1. Accuracy for calculating  data in positive and negative is accurate
2. Storing and retrieval speed is good while uploading large amount of dataset

System : INTEL I3
Hard Disk            : 500 GB.
Mouse : Logitech.
Ram : 4GB.
Operating system             :          64-bit.


Operating system : Linux.
Coding Language : pig script
Database : HDFS
TOOL                    :         PIG  
ALGORITHM       :         SVM              


Márquez-Vera, C., Morales, C.R., Soto, S.V.: Predicting school failure and dropout by using data mining techniques. IEEE J. Latin-Am. Learn. Technol. 8(1), 7–14 (2013)CrossRefGoogle Scholar