US VP Contenders tweets campaign

a text comparison analysis

as an external observer, the author lives on the east side of the Atlantic Ocean and, as a French friend of him said, on the wrong side of the Alps, it is not so easy to understand US elections. Even the colors of Democrats and Republican are misleading to the eye of an European observer used to match red color with left or reformist party and blue with right or conservatives party. Moreover in European politics that the author is aware of there’s no such Vice President political role. Could vice presidential contenders tweets text analysis help in understanding? Are there any significant differences in Kamala Harris and Mike Pence tweeting?

vice presidential contenders as twitter users

Both the candidates have 2 Twitter accounts: one more institutional and the other one more private. The author choose to analyze the private ones hoping that tweets contents are more open and direct.

Supposing the world is so limited that the only source of info about US presidential election 2020 is twitter, information available for the candidates running for becoming Vice President of United States of America are reported in the following table:

user info KamalaHarris Mike_Pence
account_created_at 2009-04-11 00:42:07 2009-02-27 23:04:51
name Kamala Harris Mike Pence
location California
description U.S. Senator and Democratic candidate for Vice
President of the United States. Wife, Momala,
Auntie. Fighting for the people. She/her.
Vice President of the United States
followers_count 7526709 5583769
friends_count 735 47
listed_count 17828 11029
statuses_count 15136 12678
favourites_count 362 742

Kamala Harris description is longer and needs at least one explanation about the term “Momala” which stands for stepmom (as per this article)

For readers not so familiar with Twitter, here’s the explanation of some of the above stats:

  • followers_count: the number of followers this account currently has;

  • friends_count: the number of users this account is following;

  • listed_count: the number of public lists that this user is a member of;

  • statuses_count: the number of Tweets (including retweets) issued by the user;

  • favourites_count: the number of tweets this user has liked in the account’s lifetime.

As per all the indicators of popularity reported it seems that Kamala Harris is more popular than Mike Pence.

Mike Pence instead liked many more tweets than Kamala Harris perhaps showing a more cooperative attitude.

tweeting habits

After retrieving tweets from beginning of August 2020 till November 4 it is possible to state that, as every politician, Harris and Pence showed an increasingly big production of tweets as elections approach.

Kamala Harris tweets with a constant pace analyzed time frame while Mike Pence seems to be more discontinuous. The stats shown below highlight that Mike Pence produced slightly more tweets than Kamala Harris.

screen_name tot_tweets min_tweets avg_tweets med_tweets max_tweets
KamalaHarris 753 2 7.6 6 28
Mike_Pence 831 1 9.4 9 31

But count of tweets published is not a measure of how the candidate is perceived, so below the counts of likes for a tweet is visualized by candidate reveling that tweets form Harris are liked the most.

The most popular tweet for Kamala Harris and Mike Pence are the following:
screen_name created_at favorite_count text
KamalaHarris 2020-08-11 20:56:25 660997 .@JoeBiden can unify the American people because he’s spent his life fighting
for us. And as president, he’ll build an America that lives up to our ideals.
I’m honored to join him as our party’s nominee for Vice President, and do what
it takes to make him our Commander-in-Chief.
Mike_Pence 2020-10-02 05:59:30 167505 Karen and I send our love and prayers to our dear friends President
@realDonaldTrump and @FLOTUS Melania Trump. We join millions across America
praying for their full and swift recovery. God bless you President Trump ; our
wonderful First Lady Melania.

The most popular tweet from Kamala Harris is the one about her candidacy as VP of United States back in early August. While the most liked tweet by Mike Pence, the current VP, has been tweeted when President Trump and the First Lady was diagnosed positive to COVID 19.

most common words

Exploring the text as bag of words so without considering the context of sentences and tweets, the most common words between the two contenders and the most frequent specific words for a contender has been visualized below.

The bar chart below visualizes the common words between contenders written at list 10 times in analyzed tweets.

To highlight some meaningful difference:

  • Pence wrote the word “president” much more than Harris;

  • Pence wrote “America” and “American” much more than Harris;

  • both the contenders mentioned the candidates for President surname but in both cases more the opponent than their own;

  • Harris wrote “vote” more often than Pence.

Most specific words, written at least 20 times in tweets, are reported in the graph below.

The first most specific word for both the contenders is the twitter screen name used to mention their own candidate for President. The following terms identifies the values specifics to their parties and to the contenders. Harris highlights values like justice and care, Pence talks of jobs and freedom.

It’s worth noting that Pence mentioned his rival “kamala” while Harris did not.

word clouds

This kind of text analysis has 2 important limits:

  • words are out of sentence context;

  • interpretation is always based on an implicit point of view.

That said, the author leaves the interpretation of the following word clouds to the readers,

Kamala Harris word cloud reveals …

Mike Pence word cloud reveals …

sentiment analysis

Using the “bing” lexicon, the two contenders tweets sentiment polarization have been analyzed using as base of analysis the single word (unigram in text analysi terminology). Sentiment analysis reveals that overall Mike Pence tweets are more positively polarized than Kamala Harris tweets.
screen_name overall_sentiment
KamalaHarris 278
Mike_Pence 549

This probably could be due to the fact that Pence, as current VP, tends to highlight the positive side of Trump administration while Kamala Harris tends to criticize it.

The graph below visualizes the polarization of tweets through time. Harris tweets became more positively polarized as elections approach.

The most positively and negatively polarized tweets for Kamala Harris and Mke Pence are:
screen_name created_at sentiment favorite_count text
KamalaHarris 2020-09-13 18:18:01 7 30940 My grandparents were phenomenal. My grandfather
was a defender of the freedom of India, while my
grandmother traveled across India to talk to women
about accessing birth control. Their passion and
commitment to improving our future led me to where
I am today. #GrandparentsDay
Mike_Pence 2020-08-14 21:25:05 8 10954 But, as Former President @AlvaroUribeVel is under
house arrest, we join all freedom loving voices
around the world in calling on Colombian officials
to let this Hero, who is a recipient of the US
Presidential Medal of Freedom, defend himself as a
free man.
KamalaHarris 2020-08-28 18:56:01 -7 98478 Emmett Till was abducted and brutally murdered on
this day in . He was only . To this day, lynching
is still not considered a federal hate crime. Our
country must work to confront this dark, shameful
part of our nation’s history and right this wrong.
Mike_Pence 2020-09-24 23:06:42 -5 1378 Under President @realDonaldTrump, in our first
three years we funded , more police officer
positions across America through the COPS Program,
broke records on violent crime and firearm
prosecutions, and violent crime steadily dropped
to near historic lows. #CopsForTrump

Harris positive thoughts goes to her family, Pence positive attitude is triggered by foreign politics event. On the opposite and negative side there are two tweets on violence but seen from two different perspective.

relationship between words

Another way for analyzing tweets text is graphing the network of bigrams (i.e. two words occurring together) in the same tweets.

Taking in consideration relationship within words tweets meanings are more visible. For Kamala Harris specific democratic topics are dispayed such as:

  • covid 19 policies ( global pandemic, social distancing, save lives);

  • climate change (climate crisis);

  • contrasting arms (gun violence).

While for Mike Pence the relevant campaign topics are:

  • work (added milon jobs manufacturing);

  • fiscal policy (taxes cut);

  • covid 19 policies (effective vaccine);

  • pro-life campaign theme (pro life human president elect).

parts of speech

Parts of speech text analysis classifies each word in tweets under analysis with its corresponding part of speech. Due to the cleaning already performed on the tweets bag of words it is possible to find adjectives, adverbs, nouns, interjections, verbs (intransitive, transitive, usu participle)"

Mike Pence tends to use more nouns, adjectives and adverbs in writing tweets while Kamala Harris more verbs in their different fashions.

Does this finding means something in terms of contender personality traits or it is merely a question of style? It is hard to say.

topic analysis

Topic analysis tries to understand from a text both what terms are associated with which topics and what documents are more likely to present which topics. In this post the analysis is organized such that the tweets corpus of each VP contenders represents a document (two documents only are analyzed) and only 2 topics are defined. The 2 topics highlighted by this analysis contains terms that are btween the most specific word visualized above.

Topic 2, which emphasizes America and jobs themes, is specific of Mike Pence , while topic 1, which emphasizes American people and vote themes, is associated with Kamala Harris.

word vectorization

This last analysis performed involve a bit of math. The procedure includes

  • creating a word vocabulary for each contender,

  • compute a co-occurrence matrix for each vocabulary;

  • and for the word “vice”, represented now by a numeric vector, determining the 2 most similar words.

Since in this analysis words are represented mathematically as vectors, the similarity can be defined as cosine similarity: vectors that has the lowest angle between them are most similar or, in terms of cosine, vectors that have a cosine nearest to one.

The two similar words found by the algorithm in Pence vocabulary seem to have an easy interpretation.

sim_words cosine_similarity
vice 1.00000
biden 0.60972
joe 0.53962

As per the table above similarity to “vice” is very far from coincidence which is cosine similarity equal one.

In Kamala Harris vocabulary the most similar words are presented in the below table.

sim_words cosine_similarity
vice 1.00000
president 0.44587
nomination 0.44179

Even if the similarity is below 0.5, the similar words found by the algorithm are in this case a reasonable fit.

final considerations

Applying quantitative analysis to text is always a challenge, and even more so if you are analyzing short texts like tweets.

Therefore the results must be taken with some skepticism, however some analysis results correspond to the common sense and limited knowledge of the author of US politics.

After performing a simple text analysis based on concepts such as word frequency, relationship between words, parts of speech and vectorization of words, the following can be stated:

  • tweets are widely used during the election campaign;

  • specificity of the VP contenders can be recovered from the tweet analysis.

Some specificities are related to party membership while others are related to personal background.

Now, on November 5th, after the American elections, the author can venture into an interpretation. Mike Pence appears to better internalize the Vice President’s role by acknowledging Trump’s leadership while Kamala Harris displays a more independent personality.

As a last word, the author wishes to all his American colleagues and friends that the election result, whatever it may be, will help America be an even better place to live and work.

Feel free to email me if you would like to go deeper in the analysis, thanks for reading!


The analysis shown in this post have been executed using R as main computation tool together with its gorgeous ecosystem. In particular text analysis relied on text2vec, tidytext, tm, topicmodels, widyr and wordcloud packages.