Comparative study on effect of anonymity and ephemerality in online social media

Motivation


The course, as the name suggests, delves into the importance of privacy while using online social media. In the current age where people from all over the world can access the information we put out in our online profiles at the click of a button, it is important to maintain some sense of privacy in our social activity.




Recently we have even seen some of the leading players in this market, like Facebook and Google, getting grilled by the Congress regarding some of their privacy policies and data handling (the word grilled is used loosely of course).


  Now this brings us to our topic statement “Comparative study on effect of anonymity and
ephemerality in online social media”. We all hear about the importance of finding a balance
between privacy/security and usability in a lot of software products. We wanted to study how
privacy affects the usability of online social media platforms. We wanted to choose platforms
which had varying levels of anonymity and ephemerality and see how they compared to each
other. Anonymity here refers to how hidden a user's real identity is when he posts on the
platform. If their real identity is their real name (like in Twitter, Facebook, Quora) then it's
completely not anonymous. In cases where you have a username(ex:reddit, Gab) it's a
pseudo anonymous platform where all your account information is tied to a username).
A completely anonymous platform (like /b/4chan, Secret, Whisper) is where no username is
tied to any post.Ephemerality refers to how long a post made by a user survives. Essentially
it refers to the life cycle of a post.


                

We chose 3 platforms to analyze for our project. Twitter, reddit and the random board of 4chan. We realized that every platform has its own structure and encourages a certain type of content and no matter what two platforms we compared we'd get extremely varied topics of discussions. So we instead focus on the sensitivity of content posted online rather than the content itself.


A brief introduction about our target platforms

















Here are some most retweeted tweets of all time







Here below is word cloud of frequently used words of top followed users













                                                                                                                           

Reddit gives you the best of the internet in one place. Get a constantly updating feed of breaking news, fun stories, pics, memes, and videos just for you. The account you make is not linked to any other social media and other than the fact that you are bound to a particular account name, you are anonymous.
Here is how a typical reddit page looks like




Popular is one of the most famous subreddits








Here is a word cloud of some of the top used words in reddit







4chan is one of the most iconic image board platforms in the world where anyone can post
comments and share images anonymously. Created in the early 21st century by Christopher
Poole, it's primary goal was to act as an image board for anime. Users can choose to be
completely anonymous. It also had an imageboard called random where literally anything was
allowed.Now has 70 boards. While visiting the board right now may not give you the most
pleasant viewing experience, it undoubtedly has left a footprint on the online world. Home
toiconic memes like Lolcats and rage comics as well as Rick rolling, a lot of popular online
trends find their origins here. The random board also famously voted for Christopher Poole
(its creator) as the time magazine most influential person of the year in 2009 beating the likes
of Obama and Putin. Another attempt to vote for Kim Jong UN for the same award in 2012
was unfortunately thwarted. 4chan also famously voted and chose a school for the deaf as
the place Taylor swift was going to perform a concert.The fact that 4chan was so popular
while also being anonymous made us choose this platform.





Networks and Data collection



We analyzed Twitter, reddit and 4chan. We used api wrappers to aid us in collecting data from reddit(PRAW), and Twitter(Tweepy/Twython). We used 4chan's read-only JSON API to simulate the actual positions of each thread

                               Analysis on twitter and reddit

Took a random sample of tweets from twitter and analyzed their favorite and retweet count.
Almost 90% of tweets don't get retweeted or favorited more than 5 times. Around 5% of
tweets get > 20 "reactions". Out of the random sample we took, the maximum number of
reactions we found were 960 favorites and 532 retweets which was a GIF about a kpop artist.
Most of the twitter posts immediately became popular. People who are already following these
accounts are the ones who mostly upvote these posts. In reddit on the other hand, posts
picked up traction much slower than twitter and started accelerating in terms of upvotes on
a few hours after the initial post in most cases.We also compared the posting activities of
accounts with a high score. As expected, in twittermost of the tweets from these accounts
received a very high favorite and retweet count. In reddit, these comments received varied
"karma points". Initial popularity is very important in twitter as compared to reddit.We looked at
the karma growth of a sample of posts on various subreddits. Similar to twitter, a large fraction
(80%) received less than 10 upvotes. Those posts which did gain a high score eventually
steadily increased their score for about two hours. After two hours, the posts got an extremely
high amount of karma. As expected, this correlates to the fact that these posts got to the hot
page of their subreddits and hence got a lot of upvotes.Below is the karma growth graph.

Analysis in 4chan



Posts are made anonymous be default, hence making "internet points" useless on 4chan.
90% of the posts on the random board in 4chan are anonymous. Posts are deleted once they
leave the 10 pages of the forum. There are 15 threads on each page. Board is extremely
different with each page refresh.Most threads have a short lifetime and expire within 5
minutes on average. If we plot a lifetime vs number of threads graph, it resembles a power
law graph.




Here below is the threads per hour graph
7500 - 9AM, 31250 - 5PM




Below is the threads lifetime graph:


900s - 9AM, 100s - 5PM




Exposure: 22s - 9AM, 3s - 5PMThe fastest thread was gone in 56 seconds, while the fastest
lasted about 4 hours. A large percentage of posts get no replies, so there are a lot of posts
which disappear in about a minute especially during times of high traffic. The median post has
2 replies.There is a feature on 4chan called “sage” where you don’t ‘bump’ the thread to the
front page again, but only a small fraction of posts used this feature (1%). Some users just
post words like “bump” to keep the thread active (about 4% of the posts had this word).
Threads last the longest between 9am and 10am EST and expire fastest between 5pm and
7pm EST. There ishigh activity until 3am or 4am EST. Primarily Americans using the site after
school/office(5PM).Twitter users usually disclose their real identity. Reddit has Pseudonymity.
4chan/b/ is mostly anonymous. While reddit doesn't reveal your real name, when we
compared word frequencies across the 3 platforms, we found that most communities in reddit
used words similar to twitter, while 4chan/b/ had a very different of frequently used words.We
used a previously used categorization technique for classifying a random sample of threads
from our 4chan data. We categorized 300 threads based on the original picture and text of
the OP.


Conclusion

One of our main aims of the project was to check whether our intuition on the privacy/security
vs usability scale made sense. After the project we see that though platforms like 4chan were
anonymous, their communities were built in such a way to facilitate discussion even in such
extreme conditions. This clearly shows that there are ways of making completely private
platforms work. While it is still to be seen whether this can work on extremely large scales,
platforms like 4chan show that there is still hope of attaining a perfectly private platform that
is usable. It was also really exciting to work on and read papers about something that is
extremely relevant to the current world and especially the current social media generation.
The course was also more focused towards the real world which was a refreshing break from
the algorithm intensive coursework we usually have.

Poster Presentation:









Team Members:


  • easwar
  • himakar




Thank You


Comments

Popular posts from this blog

Hate Speech Analysis on Gab

Authenticity of Linkedin Profiles

LinkIt