Social Media houses a vast amount of data which can be utilized for data mining. It has become an inseparable source that has been influencing the lifestyle of millions of people. The main objective of this paper is to identify the behaviour of youth to sociality using social media mining techniques. This paper demonstrate how we can exploit social and structural textual information of Tweets to identify the trending topics using frequent patterns and classify the sentiments of Tweets using Natural Language Processing. Using social media as a platform this work intends to gather data and to analyze it to draw the behavioural pattern of the current youth that helps us to identify their interests, likes and dislikes. The experimental result of this approach shows the opinion of youth/publics on various matters and persons in different areas and verified the effectiveness. This technique proposed may help people to make decision and performance improvement.
Keywords—Data Mining, Frequent Pattern Mining, Sentiment Analysis, Text Classification, Natural Language Processing
This approach collects particular dataset which has tweets from random people and try to classify them accordingly. We use data mining for this approach.
Big Data Paradox. Social media data is undoubtedly big we often have little data for each specific individual. We have to exploit the characteristics of social media and use its multidimensional, multisource, and multisite data to aggregate information with sufficient statistics for effective mining
Obtaining Sufficient Samples. One of the commonly used methods to collect data is via application programming interfaces (APIs) from social media sites.
Noise Removal Fallacy. In classic data mining literature, a successful data mining exercise entails extensive data preprocessing and noise removal as “garbage in and garbage out.” o Evaluation Dilemma. A standard procedure of evaluating patterns in data mining is to have some kind of ground truth.
1. Twitter dataset has been collected for preprocessing and data mining technique. Which is too existing to analysis in data mining
The proposed classification system consists of four stages:
(1) Data Collection stage - trending topic, topic definition and tweets are downloaded to compose a document;
(2) Preprocessing stage – techniques that transform raw data to an understandable form with better quality
(3) Data modelling stage - documents are run through a string-to-word vector kernel and converted to tokens
(4) Sentiment Analysis stage – based on the polarity scores the tweets are classified.
• Twitter real-time streaming data has been collected for pre-processing and analysis of data mining
System : INTEL I3
Hard Disk : 500 GB.
Mouse : Logitech.
Ram : 4GB.
Operating system : 64-bit.
Operating system : Linux.
Coding Language : spark
Database : HDFS
TOOL : Spark
ALGORITHM : SVM
Chinemela Queen Adougo, Ovute A.O., Obochi Charles I., “ The influence of social media on the Nigerian youths: Aba residents experience”, Quest Journals Journal of Research in Humanities and Social Service, 2015. Vol. 3, no. 3, pp.12-20.