The Science and psychometric properties of SocialCapital

The words we use in our daily lives can provide deep insight about our beliefs, values, thinking patterns and personalities. From the time of Freud to the early days of computer-based text analysis, scientists have been amassing evidence that the words we use have immense psychological value. SocialCapital combines the power of social media and machine learning to identify human personality types.

Using a unique identifier such as an email address, name or twitter handle, SocialCapital goes onto the web and finds your online social media footprint. Websites we analyze include Twitter, Facebook, LinkedIn, and personal blogs.

SocialCapital infers personality characteristics from textual information based on an open-vocabulary approach. This method reflects the latest trend in research about personality inference (Schwartz et al., 2013, and Plank & Hovy, 2015). SocialCapital first tokenizes the input text to develop a representation in an n-dimensional space. We then obtain a word vector representation of the words in the input text using open-source word embedding techniques. We then feed this representation to a machine-learning algorithm that infers a personality profile using the Big Five Personality model. To train the algorithm, the SocialCapital uses scores obtained from surveys conducted among thousands of users along with data from their Twitter feeds.

Notes About Personality

When developing the personality detection algorithm, SocialCapital relied on personality surveys to establish ground-truth data for personality inference. Ground truth refers to the factual data obtained through written personality surveys. A typical measure of accuracy for any machine-learning model is to compare the scores inferred by the model with ground-truth data.

The following notes clarify the use of personality surveys and survey-based personality estimation:

  • Personality surveys are long and time-consuming to complete. The results are therefore constrained by the number of Twitter users who were willing and available to participate in SocialCapital's study. SocialCapital will continue to gather personality data from more users, as well as with users of other online media such as email, blogs, and forums.

  • Survey-based personality estimation is based on self-reporting, which might not always be a true reflection of one's personality: Some users might give noisy answers to such surveys. To reduce the noise, SocialCapital filtered survey responses by including attention-checking questions and by discarding surveys that were completed too quickly.

  • While the correlation between inferred and survey-based scores is both positive and significant, the results imply that inferred scores might not always correlate with survey-based results. Researchers from outside of SocialCapital have also done experiments to compare how well inferred scores match those obtained from surveys, and none reported a fully consistent match:

In general, it is widely accepted in research literature that self-reported scores from personality surveys do not always fully match scores that are inferred from text. What is more important, however, is that SocialCapital found that characteristics inferred from text can reliably predict a variety of real-world behavior.

Personality Models:

The Big Five personality model is the most widely used model to generally describe how a person engages with the world. The model includes five primary dimensions:

  • Openness: Openness is a general appreciation for art, emotion, adventure, unusual ideas, imagination, curiosity, and variety of experience. People who are open to experience are intellectually curious, open to emotion, sensitive to beauty and willing to try new things

  • Conscientiousness:Conscientiousness is a tendency to show self-discipline, act dutifully, and aim for achievement against measures or outside expectations. It is related to the way in which people control, regulate, and direct their impulses. High scores on conscientiousness indicate a preference for planned rather than spontaneous behavior.[38] The average level of conscientiousness rises among young adults and then declines among older adults.

  • Extraversion: Extraversion is characterized by breadth of activities (as opposed to depth), surgency from external activity/situations, and energy creation from external means.[40] The trait is marked by pronounced engagement with the external world. Extraverts enjoy interacting with people, and are often perceived as full of energy. They tend to be enthusiastic, action-oriented individuals. They possess high group visibility, like to talk, and assert themselves

  • Agreeableness:The agreeableness trait reflects individual differences in general concern for social harmony. Agreeable individuals value getting along with others. They are generally considerate, kind, generous, trusting and trustworthy, helpful, and willing to compromise their interests with others.[42] Agreeable people also have an optimistic view of human nature.

  • Neuroticism: Neuroticism is the tendency to experience negative emotions, such as anger, anxiety, or depression.[46] It is sometimes called emotional instability, or is reversed and referred to as emotional stability. According to Eysenck's (1967) theory of personality, neuroticism is interlinked with low tolerance for stress or aversive stimuli.[47] Those who score high in neuroticism are emotionally reactive and vulnerable to stress

Research References

For more information about the research behind the SocialCapital personality assessment method, see the following papers and books.