#GeneralElections2019

December 21, 2018

Problem Statement

The course Privacy and Security on Online Social Media has taught us about the growing importance of social media in approximating to a good degree the public sentiments over any trending issues currently. Social media can even be thought as a virtual simulation of the real world itself to a certain extent.

Our focus was to tackle one such challenge: General Elections 2019. We wanted to analyse the change in sentiments of people who tweeted about elections in 2014 versus today. We planned on using the tweet text features, along with the account features to build a classifier to classify new tweets into classes (political leanings), and map these results to obtain some sort of opinion polls before the elections.

Dataset

The dataset used for our experiments were "Twitter and Polls: Analyzing and estimating political orientation of Twitter users in India General Elections 2014".

Preliminary Analysis

After some preliminary tests on a subset of the data (around 10k randomly chosen tweets), we found only around 72% of the tweets are accessible today. Of the remaining 28% unaccessible tweets, 38.5% of them have been deleted, 15.4% of the tweets belong to users who have since then deactivated their account, and the remaining 46.2% of them belong to users suspended by Twitter for rule violations.

Deleted tweets indicate that they were possibly hate speech or even fake news, while the number of deactivated users are an indication that there were a lot of bogus accounts that had joined Twitter purely to enforce their propaganda, similar to sock puppets. In addition, the number of suspended users are an indication of the high number of malicious users who post loads of inappropriate content.

Another interesting observation was that the same users who were extremely active during 2014 barely tweet anymore, this could also be that the timespan considered then were the immediate 8 months preceding the elections, while the elections are more than 6 months away as of

Temporal Opinion Analysis

Using keyword search (including hashtags) and sentiment polarities as features, we used Naive Bayes (and later SVM) classifiers to extract the politically aligned tweets from a set of 500 active users, with further discrimination on whether the tweet had pro (or anti) affiliation to any of the 3 political parties namely BJP, Congress and AAP. We used a standard list of keywords specific to each political party for more accurate classification. The following are the number of tweets for each time span:

	Numbers in First Time Span (2014)	Numbers in Second Time Span (2018)
Politically oriented tweets	17403	5857
Pro-affiliation known	9721	2883
Anti-affiliation known	4390	2135

These tweets were further classified as having pro (or anti) affiliation to each of the above mentioned 3 political parties.

The following are 2 visual representations of only the pro affiliations to each of the 3 parties for both the time spans:

Lastly, the following is a visual representation which captures the pro as well as anti affiliations to each of the 3 political parties for both time spans: