Authenticity of Linkedin Profiles
Webapp demo
Problem
- Lot of people put false work experience and other information on Linkedin in order to gain attention of the recruiters.
- Given a linkedin profile, determine the evidence of authenticity of information present on the user's linkedin profile from publicly available information about that user on other social networks (for example facebook).
Data Collection
- Approach:
- Given a username on linkedin, we scrape the information on the profile (such as work experience, education etc).
- From the personal information such as name, age, city etc, we then go to facebook and search for similar profiles.
- We try to find the most similar profile from the search result.
- We then scrape the timeline of the user and compare that information with the information on linkedin profile.
- Method:
- Linkedin does not provide any API to get the user profile information.
- Wrote a scraper using selenium to scrape the user profile information.
- We also wrote a script to get all the connections of a user studying at IIIT-H.
- Facebook also does not provide any API to get posts on user timeline.
- Wrote scraper to get all timeline posts on a given user.
- Data-processing:
- Data cleaning (lower casing and stemming)
- Retrieving hashtags from posts
- NER (DBPedia spotlight, spacy NER)
- Tokenize
- Score Calculation
- For one job post, we go through all the facebook posts and calculate following similarity scores::
- Tokenized matching
- Jaccard Similarity on n gram
- Cosine similarity
- Profile match score (matching profile information like name, city, age etc)
- The score is incremented based on the number of reacts on that fb post.
- For cosine similarity we take a thresholding strategy, where we only increment the score when the similarity is above a certain threshold.
- We consider 4 different thresholds i.e 0.6, 0.7, 0.8, 0.9
- Now, we have 5 different scores for a single linkedin post. To obtain a final score we take a weighted mean.
- The final score is calculated by taking the weighted sum of all the similarity scores.
- For one job post, we go through all the facebook posts and calculate following similarity scores::
- Analysis
- Challenges faced
- Coreference resolution of abbreviations. For eg, FB and Facebook are same thing.
- Tokenized matching doesn’t always work. For eg,
- Linkedin post: “Intern at Google”
- Facebook post: “I would love to work at Google”
- Need to figure out if we can find out when two people became friends on facebook.
Code is available at : https://github.com/patniharshit/PSOSM
Team
- Harshit Patni (201501107)
- Aashay Singhal (201502112)
- Amandeep Shahi (201501087)
Comments
Post a Comment