Queen’s Park, Toronto, as seen from Macdonald Block. Photo by Christopher Belanger, Twitter logo copyright Twitter.

Ontario, Meet Your MPPs (Most Prolific Posters)

Analyzing Social Media Use By Elected Officials in Ontario

Last updated on Sep 14, 2021 33 min read

Introduction

How do politicians in Ontario use Twitter? Who do they talk to, and what do they talk about? In this post, we’ll look at what and who politicians care about by analyzing their public statements on social media.

We’ll look at how often and successfully MPPs tweet. We’ll also do some content analysis to see what different parties focus on. And we’ll do some network analysis to look for patterns in who MPPs talk to online.

Basic Ontario Civics

This post assumes some basic familiarity with Ontario and its political parties, but we’ll do some quick reminders.

Our elected politicians are called Members of Provincial Parliament (MPPs).
The Progressive Conservative Party (PCP) holds a majority led by Doug Ford.
The New Democratic Party (NDP) is the official opposition led by Andrea Horwath.
The Liberal Party (LIB) has a few seats and is apparently led by Steven Del Duca.
Mike Schreiner of the Green Party (GRN) holds one seat.
Once elected, PCP MPP Belinda Karahalios split off to create the New Blue Party (NBP), with one seat.
There are also a few other, uh, colourful independent MPPs.

Belinda Karahalios appears to have deleted her Twitter account, so her past tweets are unreadable and I’ll exclude her from the analysis for simplicity. But if the New Blue Party picks up steam we could always include her in a future analysis.

Getting the Data

We need two data sets: each MPP’s Twitter handle, then a set of tweets from each MPP.

Twitter Handles

First we need to know each MPP’s Twitter handle. Strangely, the Office of the Legislative Assembly’s MPP contact page doesn’t have this information, but the Horticultural Trades Association does. Thanks guys!

# manually adjust some party affiliations that weren't up to date in the original data
ind_mpps <- c("Roman Baber", "Rick Nicholls", "Jim Wilson", "Randy Hillier")
nbp_mpps <- c("Belinda Karahalios")

url <- "https://horttrades.com/mpp-twitter-handles"
handle_html <- rvest::read_html(url)

# extract the table, remove the header, then tidy them up and expand them
# Roman Baber is now an Independent MPP
mpp_handles <- tibble::tibble(data = rvest::html_table(handle_html)) %>%
  slice(-1) %>%
  mutate(data = purrr::map(data, rename, name = 1, riding = 2, handle = 3),
         data = purrr::map(data, select, name, riding, handle),
         data = purrr::map(data, filter, name != "NAME"),
         party = c("PCP", "NDP", "LIB", "GRN", "IND")) %>%
  unnest(data) %>%
  mutate(name = stringr::str_remove(name, "Hon. ")) %>%
  filter(is.na(as.numeric(name))) %>% # remove some junk rows that are just numbers
  mutate(handle_txt = stringr::str_remove(handle, "@")) %>%
  mutate(party = if_else(name %in% ind_mpps, "IND", party),
         party = if_else(name %in% nbp_mpps, "NBP", party)) %>%
  mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN", "NBP", "IND")))

Here’s the list of MPPs with ridings and Twitter handles according to the HTA as of September 8, 2021.

# custom function to colour the rows to match the party.
row_colours <- function(x, brighten = 0) {
    for (i in 1:nrow(for_table)) {
      x <- kableExtra::row_spec(x,
                                row = i,
                                background = party_colours(for_table$party[[i]], brighten = brighten))
    }
  x
}

# create a smaller data set just for the table.
# recognize Karahalios as the sole member of the New Blue Party for this table.
for_table <- mpp_handles %>%
  select(name, riding, handle, party) %>%
  arrange(party, name)

# make a nice-ish table.
# DT::datatable() gave jQuery errors--couldn't figure out why.
for_table %>%
  knitr::kable(col.names = c("Name", "Riding", "Twitter Handle", "Party")) %>%
  row_colours(brighten = 120) %>%
  kableExtra::scroll_box(height = "300px")

Name	Riding	Twitter Handle	Party
Amarjot Sandhu	Brampton West	@sandhuamerjot1	PCP
Amy Fee	Kitchener South – Hespeler	@AmyFeePC	PCP
Andrea Khanjin	Barrie – Innisfil	@Andrea_Khanjin	PCP
Aris Babikian	Scarborough – Agincourt	@Aris_Babikian	PCP
Bill Walker	Bruce – Grey – Owen Sound	@billwalkermpp	PCP
Billy Pang	Markham – Unionville	@Billy__Pang	PCP
Caroline Mulroney	York – Simcoe	@C_Mulroney	PCP
Christina Maria Mitas	Scarborough Centre	@Christina_Mitas	PCP
Christine Elliott	Newmarket – Aurora	@celliottability	PCP
Christine Hogarth	Etobicoke – Lakeshore	@CHogarthPC	PCP
Daisy Wai	Richmond Hill	@Daisy_Wai_PC	PCP
Daryl Kramp	Hastings – Lennox and Addington	@darylkramp	PCP
Dave Smith	Peterborough – Kawartha	@DavidSmithMP	PCP
David Piccini	Northumberland – Peterborough South	@DavidPiccini	PCP
Deepak Anand	Mississauga – Malton	@DeepakAnandMPP	PCP
Donna Skelly	Flamborough Glanbrook	@SkellyHamilton	PCP
Doug Downey	Barrie	@douglasdowney	PCP
Doug Ford	Etobicoke North	@fordnation	PCP
Effie Triantafilopoulos	Oakville North – Burlington	@Effie_ONB	PCP
Ernie Hardeman	Oxford	@erniehardeman	PCP
Gila Martow	Thornhill	@GilaMartow	PCP
Goldie Ghamari	Carleton	@gghamari	PCP
Greg Rickford	Kenora – Rainy River	@GregRickford	PCP
Jane McKenna	Burlington		PCP
Jeff Yurek	Elgin – Middlesex – London	@JeffYurekMPP	PCP
Jeremy Roberts	Ottawa West – Nepean	@JR_Ottawa	PCP
Jill Dunlop	Simcoe North	@JillDunlop1	PCP
Jim McDonell	Stormont – Dundas – South Glengarry	@JimMcDonell	PCP
John Yakabuski	Renfrew – Nipissing – Pembroke	@JYakabuskiMPP	PCP
Kaleed Rasheed	Mississauga East – Cooksville	@krasheedmpp	PCP
Kinga Surma	Etobicoke Centre	@KingaSurmaMPP	PCP
Laurie Scott	Haliburton – Kawartha Lakes – Brock	@LaurieScottPC	PCP
Lindsey Park	Durham	@lparkpc	PCP
Lisa MacLeod	Nepean	@MacLeodLisa	PCP
Lisa Thompson	Huron – Bruce	@LisaThompsonMPP	PCP
Logan Kanapathi	Markham – Thornhill	@LoganKanapathi	PCP
Lorne Coe	Whitby	@lornecoe	PCP
Merilee Fullerton	Kanata – Carleton		PCP
Michael Parsa	Aurora – Oak Ridges – Richmond Hill	@MichaelParsa	PCP
Michael Tibollo	Vaughan – Woodbridge	@MichaelTibollo	PCP
Mike Harris	Kitchener – Conestoga	@mikeharrisjrpc	PCP
Monte McNaughton	Lambton – Kent – Middlesex	@MonteMcNaughton	PCP
Natalia Kusendova	Mississauga Centre	@NatKusendova	PCP
Nina Tangri	Mississauga – Streetsville	@ninatangri	PCP
Norman Miller	Parry Sound – Muskoka		PCP
Parm Gill	Milton	@parmgill	PCP
Paul Calandra	Markham – Stouffville	@PaulCalandra	PCP
Peter Bethlenfalvy	Pickering – Uxbridge	@PBethlenfalvy	PCP
Prabmeet Singh Sarkaria	Brampton South	@PrabSarkaria	PCP
Randy Pettapiece	Perth – Wellington	@RandyPettapiece	PCP
Raymond Sung Joon Cho	Scarborough North	@RaymondChoPC	PCP
Robert Bailey	Sarnia – Lambton		PCP
Robin Martin	Eglinton – Lawrence	@RobinMartinPC	PCP
Rod Phillips	Ajax	@RodPhillips01	PCP
Ross Romano	Sault Ste. Marie	@RossRomanoSSM	PCP
Rudy Cuzzetto	Mississauga – Lakeshore	@RudyCuzzetto	PCP
Sam Oosterhoff	Niagara West	@samoosterhoff	PCP
Sheref Sabawy	Mississauga – Erin Mills	@SherefSabawyPC	PCP
Stan Cho	Willowdale	@StanChoMPP	PCP
Stephen Crawford	Oakville	@stcrawford2	PCP
Stephen Lecce	King – Vaughan	@Sflecce	PCP
Steve Clark	Leeds – Grenville – Thousand Islands and Rideau Lakes	@SteveClarkPC	PCP
Sylvia Jones	Dufferin – Caledon		PCP
Ted Arnott	Wellington – Halton Hills	@MPPArnottWHH	PCP
Toby Barrett	Haldimand – Norfolk	@TobyBarrettHN	PCP
Todd Smith	Bay of Quinte	@ToddSmithPC	PCP
Victor Fedeli	Nipissing	@VictorFedeli	PCP
Vijay Thanigaslam	Scarborough – Rouge Park	@VijayThaniMPP	PCP
Vincent Ke	Don Valley North	@vinentkempp	PCP
Will Bouma	Brantford – Brant	@WillBoumaBrant	PCP
Andrea Horwath	Hamilton Centre	@AndreaHorwath	NDP
Bhutila Karpoche	Parkdale – High Park	@BhutilaKarpoche	NDP
Catherine Fife	Waterloo	@CFifeKW	NDP
Chris Glover	Spadine – Fort York	@ChrisGloverMPP	NDP
Doly Begum	Scarborough Southwest	@DolyBegum	NDP
Faisal Hassan	York South – Weston	@FaisalHassanNDP	NDP
France Gélinas	Nickel Belt	@NickelBelt	NDP
Gilles Bisson	Timmins	@BissonGilles	NDP
Gurratan Singh	Brampton East	@GurratanSingh	NDP
Guy Bourgouin	Muchkegowuk – James Bay	@BourgouinGuy	NDP
Ian Arthur	Kingston and the Islands	@IanArthurMPP	NDP
Jamie West	Sudbury	@jamiewestndp	NDP
Jeff Burch	Niagara Centre	@JeffBurch_	NDP
Jennifer French	Oshawa	@jennfrench	NDP
Jennifer Stevens	St. Catharines	@JennieStevens_	NDP
Jessica Bell	University – Rosedale	@JessicaBellTO	NDP
Jill Andrew	Toronto – St. Paul’s	@JILLSLASTWORD	NDP
Joel Harden	Ottawa Centre	@JoelHardenONDP	NDP
John Vanthof	Timiskaming – Cochran	@john_vanthof	NDP
Judith Monteith-Farrell	Thunder Bay – Atiokan	@Judith_NDP	NDP
Kevin Yarde	Brampton North	@KevinYardeNDP	NDP
Laura Mae Lindo	Kitchener Centre	@LauraMaeLindo	NDP
Lisa Gretzky	Windsor West	@LGretzky	NDP
Marit Stiles	Davenport	@maritstiles	NDP
Michael Mantha	Algoma – Manitoulin	@M_Mantha	NDP
Monique Taylor	Hamilton Mountain	@MTaylorNDP	NDP
Paul Miller	Hamilton East – Stoney Creek		NDP
Peggy Sattler	London West	@PeggySattlerNDP	NDP
Percy Hatfield	Windsor – Tecumseh	@PercyHatfield	NDP
Peter Tabuns	Toronto – Danforth	@Peter_Tabuns	NDP
Rima Berns-McGown	Beaches – East York	@beyrima	NDP
Sandy Shaw	Hamilton West – Ancaster – Dundas	@shaw_sandy	NDP
Sara Singh	Brampton Centre	@SaraSinghMPP	NDP
Sol Mamakwa	Kiiwetinoong	@solmamakwa	NDP
Suze Morrison	Toronto Centre	@SuzeMorrison	NDP
Taras Natyshak	Essex	@TarasNatyshak	NDP
Terence Kernaghan	London North Centre	@kernaghant	NDP
Teresa J. Armstrong	London - Fanshawe	@TArmstrongNDP	NDP
Tom Rakecevic	Humber River – Black Creek		NDP
Wayne Gates	Niagara Falls	@Wayne_Gates	NDP
Amanda Simard	Glengarry – Prescott – Russell	@ASimardL	LIB
John Fraser	Ottawa South		LIB
Kathleen Wynne	Don Valley West	@Kathleen_Wynne	LIB
Lucille Collard	Ottawa – Vanier	@LucilleCollard	LIB
Michael Coteau	Don Valley East	@coteau	LIB
Michael Gravelle	Thunder Bay – Superior North	@MichaelGravelle	LIB
Mitzie Hunter	Scarborough – Guildwood	@MitzieHunter	LIB
Stephen Blais	Orleans	@StephenBlais	LIB
Mike Schreiner	Guelph	@MikeSchreiner	GRN
Belinda Karahalios	Cambridge	@KarahaliosPC	NBP
Jim Wilson	Simcoe – Grey	@mppjimwilson	IND
Randy Hillier	Lanark – Frontenac – Kingston	@randyhillier	IND
Rick Nicholls	Chatham – Kent – Leamington	@RickNichollsCKL	IND
Roman Baber	York Centre	@Roman_Baber	IND

# clean up
rm(for_table)

There may have been some additional seat changes, so for this preliminary analysis we’re relying on the data I was able to find. (For a published study we’d be a bit more careful here.)

Tweets

I used R’s rtweets package to get as many tweets as possible for each MPP. I’m using the free Twitter API, so we’re limited here to specific information about each MPP’s past 3,200 tweets. It includes tweet texts, favourites, and mentions, so we can do a lot. But it doesn’t include replies, so unfortunately we won’t be able to find which MPPs are most consistently ratioed (which was actually my original research question).

# doing it with a for loop because of rate limits
# set up an empty list
results <- list()

# get the tweets for each MPP one at a time.
# we will bump up against rate limits! so monitor progress, and once it fails,
# wait 15 minutes and then re-start it at the point of failure.
# note that you need your own token for this code to run :)
for (i in 1:nrow(mpp_handles)){
  message(i)
  result <- NULL
  result <- rtweet::get_timeline(user = mpp_handles$handle_txt[[i]],
                                 n = 3200,
                                 token = token)
  results[[i]] <- result
}

# set up a new tibble with our results
mpp_tweets <- mpp_handles
mpp_tweets$tweets <- results

# then filter out those with no tweets found, and those who found my bot's tweets
# for some reason! likely due to a weird failure condition due to rate limits
# then make party an ordered factor
# also figure out which tweets are not retweets
mpp_tweets <- mpp_tweets %>%
  mutate(num_tweets_found = purrr::map_dbl(tweets, nrow)) %>%#purrr::map_lgl(tweets, function(x) "created_at" %in% colnames(x)),
  filter(num_tweets_found > 0) %>%
  mutate(my_tweets_found = purrr::map_lgl(tweets, function(x) "OttCovidApts2" %in% unique(x$screen_name))) %>%
  filter(!my_tweets_found) %>%
  mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN","IND"))) %>%
  mutate(tweets_no_rt = purrr::map(tweets, filter, !is_retweet),
         num_tweets_no_rt = purrr::map_dbl(tweets_no_rt, nrow))

# save our data here, since it was a pain to collect
save.image(file = "mpp_progress_test.Rdata")

I downloaded these tweets on September 8, 2021, so that’s the time cut-off for this analysis.

Behaviour: How are MPPs Using Twitter?

Let’s look at a few dimensions of online behaviour: how many party members are online, how much original content they post vs. other people’s content, and how often they post and with how much reaction.

Most Online Party

Which party is the most online? We’ll look to see what percentage of elected MPPs have at least one public tweet:

# figure out what % of party members have at least one public tweet
online_stats <- mpp_handles %>%
  group_by(party) %>%
  filter(party != "NBP") %>%
  summarise(num_total = n()) %>%
  left_join(mpp_tweets %>%
              group_by(party) %>%
              summarise(num_online = n()),
            by = "party") %>%
  mutate(pct_online = num_online / num_total) %>%
  mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN","IND")))

# make a static plot
online_plot <- online_stats %>%
  ggplot() +
  geom_col(aes(x=reorder(party, pct_online),
               y = pct_online,
               fill = party,
       text = sprintf("Party: %s\n%% MPPs with at least one public tweet: %%%.1f",
                      party,
                      pct_online*100)),
           ) +
  scale_fill_parties +
  theme_minimal()+
  scale_y_continuous(label = scales::percent) +
  coord_flip() +
  labs(title = "Which Party is the Most Online?",
       subtitle = "Percent of Ontario's party members with at least one public tweet.",
       x = NULL,
       y = NULL,
       fill = "Party") +
  theme(legend.position = "top")

plotly::ggplotly(online_plot, tooltip = "text")

# clean up
rm(online_stats)
rm(online_plot)

All of the independents and the one Green MPP are active on Twitter, but that feels like cheating given the small sample sizes. Of the larger parties, the NDP edges out both the Liberals and the PCs.

But really, all parties are quite online, with over 80% of their members active on Twitter. This suggests that political Twitter could be a rich topic of study, since most MPPs are using it.

Tweets or Retweets?

Let’s look at original tweets vs. retweets, to see whether MPPs are posting their own thoughts or just echoing and amplifying others. An original tweet is written by the MPP in question and posted for the first time, and a retweet is a “a re-posting of a Tweet,” which Twitter says “helps you and others quickly share that Tweet with all of your followers.”

First we can find the percentage of original tweets for each MPP, and plot the results on a histogram to see the distribution of their overall behaviour.

mpp_tweets_orig <- mpp_tweets %>%
  mutate(pct_orig = num_tweets_no_rt/num_tweets_found) %>%
  select(-tweets, -tweets_no_rt)

# make a static histogram of all MPPs together for % of original tweets,
# then make it interactive
# thanks to this SO answer for the inspiration
# https://stackoverflow.com/questions/61109849/is-there-a-way-to-add-the-bin-range-label-into-the-tooltip-for-a-histogram-using
plot_step1 <- ggplot_build(
  mpp_tweets_orig %>% 
  ggplot() + 
  geom_histogram(aes(x=pct_orig), 
                 bins = 11,
                 fill = "darkgrey",
                 colour = "darkgrey")
)$data[[1]]

plot_step2 <- plot_step1 %>% {
  ggplot(data = .,
         aes(x=factor(x), 
             y = count, 
             text = sprintf("Count: %d\nRange: (%.1f-%.1f%%)", y, round(xmin*100,1), round(xmax*100,1)))) + 
  scale_x_discrete(labels = sprintf("%.0f-%.0f%%", .$xmin*100, .$xmax*100)) + 
  geom_bar(stat="identity", 
           width = 1) + 
  labs(title = "Histogram: Percentage of Original MPP Tweets",
       subtitle = "Most MPPs have a fairly even mix of original tweets and retweets.",
       x = "% Original Tweets",
       y = "Count") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45 )) #, vjust = -.05))
  }


# make it interactive
plotly::ggplotly(plot_step2,
tooltip = c("text"))

# clean up
rm(plot_step1)
rm(plot_step2)

Most MPPs have slightly more than 50% original tweets, meaning they write about one original tweet for each retweet. But some do almost nothing but retweet, and some post almost only original content.

Let’s see if there is an obvious difference in tweet/retweet behaviour across the parties with a set of box plots.

# make a set of static boxplots by party for % original tweets
plot_box <- mpp_tweets_orig %>%
  ggplot() +
  geom_boxplot(aes(x=party, y = pct_orig, fill = party)) +
  scale_fill_parties +
  theme_minimal() +
  labs(title = "Percentage of Tweets that are Original (Not Re-Tweets)",
       x = NULL,
       y = NULL,
       fill = "Party") +
  scale_y_continuous(label = scales::percent) +
  theme(legend.position = "none")

plotly::ggplotly(plot_box)

The boxes overlap for the three major parties, but we can still see some interesting population differences. The NDP and Conservatives have much longer tails, showing that they have members who do almost nothing but retweet, and members who produce almost exclusively original content. The Liberals have the most balanced approach, with the majority of members producing at least one original tweet for each retweet.

Tweet Volume, Velocity, and Popularity

Let’s look at how often MPPs tweet and how much engagement they get, defined as the number of favourites and retweets. In this analysis we’re looking only at original tweets, so MPPs won’t get credit for retweeting a 20k-like video of someone else’s cat. We want to know how much organic engagement party members get by themselves.

For each active MPP, the following log-log plot shows their average engagements per tweet vs. how often they tweet on average. Moving right means tweeting more often, and moving up means getting more reactions to each tweet. The size of each circle shows how many tweets we found (with an API-imposed maximum of approximately 3,200).

# function to find out the average tweets per day, average engagements per tweet,
# and number of tweets, and filter out a few erroneous rows
get_twitter_stats <- function(z){
  if (!"created_at" %in% colnames(z)) return (tibble::tibble(num_tweets = 0, tweets_per_day = 0, engagements_avg = 0))
  if ("OttawaCovidAptsDose2" %in% z$source) return (tibble::tibble(num_tweets = 0, tweets_per_day = 0, engagements_avg = 0))
z %>%
  mutate(timespan = max(created_at) - min(created_at),
         timespan = as.numeric(timespan),
         engagements = favorite_count + retweet_count) %>%
 select(timespan, engagements) %>%
  group_by(timespan) %>%
  summarise(num_tweets = n(),
            engagements_avg = mean(engagements)) %>%
  mutate(tweets_per_day = num_tweets/timespan) %>%
  select(-timespan)
}

# get the engagement stats, then unnest them
# also make party a factor with set levels for colouring later
mpp_engagement <- mpp_tweets %>%
  mutate(engagement_stats = purrr::map(tweets_no_rt, get_twitter_stats)) %>%
  unnest(cols = c(engagement_stats)) %>%
  select(-tweets, -tweets_no_rt)  %>%
  mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN","IND")))

# set up axes for the plot
x_breaks <- c(.14, .4, 1.4, 4, 14)
y_breaks <- c(4, 15, 60, 240, 700)
  
# create a static ggplot
plot_engagement <-  mpp_engagement %>%
  ggplot()  +
  geom_segment(aes(x= median(x_breaks), xend = median(x_breaks), y = min(y_breaks), yend = max(y_breaks)), colour = "black", size = 1) +
  geom_segment(aes(x=min(x_breaks), xend = max(x_breaks), y = median(y_breaks), yend = median(y_breaks)), colour = "black", size = 1) +
  annotate("text", x = x_breaks[[2]], y = y_breaks[[2]], label = "Beginners", colour = "#666666", size = 10) +
  annotate("text", x = x_breaks[[2]], y = y_breaks[[4]], label = "Influencers", colour = "#666666", size = 10) +
  annotate("text", x = x_breaks[[4]], y = y_breaks[[2]], label = "Tryhards", colour = "#666666", size = 10) +
  annotate("text", x = x_breaks[[4]], y = y_breaks[[4]], label = "Extremely\nOnline", colour = "#666666", size = 10) +
  geom_point(aes(x=tweets_per_day, 
             y = engagements_avg, 
             size = num_tweets,
             colour = party,
             text = sprintf("%s (%s)\nAvg. Tweets/Day: %.2f\nAvg. Engagements/Tweet: %.2f\nTotal Tweets Found: %d", 
                            name, party, 
                            round(tweets_per_day, digits = 2),
                            round(engagements_avg, digits = 2),
                            num_tweets)),
             alpha = 0.6) +
  scale_x_continuous(trans = "log"
                     , breaks = x_breaks
                     ) +
  scale_y_continuous(trans = "log"
                     , breaks = y_breaks
                     ) +
  theme_minimal() +
  scale_size(guide = "none") +
  labs(title = "Original Tweets: Volume, Velocity, and Engagement",
       x = "Tweets per Day",
       y= "Avg. Engagements per Tweet",
       colour = "Party") +
    scale_colour_discrete(type = c("#1A4782","#F37021","#D71920","#3D9B35", "#CCCCCC"))

# make the plot interactive
plotly::ggplotly(plot_engagement,
                 tooltip = c("text")) %>% 
  plotly::layout(legend = list(orientation = "h", x = 0.29, y = -0.2))

# clean up
rm(plot_engagement)

Noting again the log axes, we can divide this plot into four quadrants representing four different kinds of Twitter users.

The Beginners in the bottom-left quadrant are modest posters. They don’t tweet much, averaging fewer than 1.4 tweets per day, and people tend not to notice much when they do.

The Extremely Online MPPs in the top-right tweet a lot, and people react when they do. The NDP, PC, and Green leaders are there, and Randy’s out in front tweeting up a storm.

The Influencers in the top-left are a rare group who tweet infrequently but generate a big reaction. I browsed their posts, and it’s not hard to see why: they’re high-profile MPPs who make a small number of issues-focused posts.

Finally, we have the Tryhards in the lower-right who post constantly but don’t get much of a reaction. These MPPs are churning out tweets every day, often in both official languages, without anyone really noticing.

Content: What Are MPPs Tweeting About?

Next, let’s look at the content of these tweets. We’ll use the simple approach of dividing MPPs by party, breaking tweets into individual words, and then calculating each word’s relative frequency by party. This way we can compare word usage across parties of different sizes. If you browse the code, we’ll do this with dplyr and a few helper functions from tidytext.

# get tidytext's stopwords and add a few more
stopwords <- tidytext::stop_words %>%
  bind_rows(tibble(word = c("t.co", "amp", "http", "https", "it’s"), 
                   lexicon = "custom"))

# pull out the tweet texts, break into words, remove any numbers, remove
# stopwords, group by party, then count word frequency.
# then get the intra-party relative frequency of each word.
word_freqs <- mpp_tweets %>%
  select(party, tweets_no_rt) %>%
  mutate(text = purrr::map_chr(tweets_no_rt, function(x) pull(x, text) %>% 
                                 unlist() %>% 
                                 stringr::str_flatten(collapse = " "))) %>%
  select(-tweets_no_rt) %>%
  tidytext::unnest_tokens(word, text)  %>%
  filter(is.na(as.numeric(word))) %>%
  anti_join(stopwords, by = "word") %>%
  group_by(word, party) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  group_by(party) %>%
  mutate(total_words = sum(count),
         party_pct = count/total_words) %>%
  select(-total_words)

# now set up some words of interest
word_of_interest <- c("onpoli", "ontario", 
                      #"ford", "premier",
                      "justice", "crime",
                      "environment", "economy"
                      )

# filter for words of interest to make a smaller dataset for plotting
 word_freqs_of_interest <- word_freqs %>%
  filter (word %in% word_of_interest)

The following plot shows word prevalences in original tweets for each party, excluding retweets. I’ve chosen a few keywords for comparison: “Ontario” and “#onpoli”; “justice” and “crime”; and “environment” and “economy.” Note that the y-axes are different for each word, so these plots are only good for relative comparisons.

# make a facet-wrapped set of bar plots for word frequency by party
wordplot <- word_freqs_of_interest %>%
  mutate(word = factor(word, levels = word_of_interest)) %>%
  ggplot() +
  geom_col(aes(x = party, 
               y = party_pct, 
               fill = party,
       text = sprintf("Word: %s\nParty: %s\nPrevalence in %s Tweets: %%%.2f", 
                      word, party, party,
                      (party_pct * 100)
                      ))) +
  facet_wrap(~word, scales = "free_y", ncol = 2) +
  theme_minimal() +
  scale_y_continuous(labels = scales::percent)  +
    scale_fill_discrete(type = c("#1A4782","#F37021","#D71920","#3D9B35", "#CCCCCC")) +
  theme(legend.position = "bottom") +
  labs(title = "Word Prevalence in Original Tweets by MPPs",
       x = NULL,
       y = NULL, #"% of Party Words",
       fill = "Party"
  )

# make the plot interactive
plotly::ggplotly(wordplot, width = 600,
                 height = 500,
                 tooltip = c("text")) %>% 
  plotly::layout(legend = list(orientation = "h", x = 0.2, y = -0.05))

# clean up
rm(wordplot)

With those caveats, we can make a few observations:

ONPoli vs. Ontario: As suspected, the PCs use the general word “Ontario” much more than the other parties, and the other parties use “#ONPoli” much more than the PCs. This might reflect different purposes–making official statements vs. trying to start an online discussion–or it might reflect different levels of power and comfort. Why use a hashtag when everyone is already listening to you?
Justice vs. Crime: The PCs use the word “crime” far more than any other party, and the NDP uses the word “justice” far more. This lines up with what we might expect from a right-wing party with a populist leader and a left-wing party. The surprise here is that the Liberals and Greens barely mention crime at all. Given their third-party status, maybe they’re focusing their messaging elsewhere?
Environment vs. Economy: The details here surprise me. We might expect the NDP’s messaging to focus less on the economy, but I wouldn’t have expected this much of a disparity. We also might have expected the Greens to focus on the environment, but I wouldn’t have expected the PCs to focus on it so much too, or for the NDP to mention it the least out of the four parties.

Without getting into detailed statistics, it looks pretty clear that the parties have different priorities and messaging directives when it comes to their online communications.

Networks: Who Talks to Whom?

Which MPPs interact with which on Twitter? We can explore this question using a network diagram, also called a graph, where each MPP is represented by a circle (or a “node”), and each interaction is represented by a line (or an “edge”). The thickness of each line tells us how often MPPs interact.

We’ll define an interaction as replying to a tweet or mentioning another user in a tweet, since those are the two data points we get from the free Twitter API.

Here we’ll use directed force networks, which incorporate some dynamics: nodes repel each other by default, and attract each other based on the strength of the ties between them. This can help us to see patterns in the data, as the nodes work themselves into a stable configuration. We’ll use the igraph package to make a static picture and networkD3 to make an interactive simulation. For more details I’d suggest Keith McNulty’s free book Handbook of Graphs and Networks in People Analytics.

All Ties

First, we’ll make a directed force network for all interactions between MPPs. Each MPP gets a node, and every pair that has ever interacted gets a line between them. The interactive version runs quite slowly with 108 nodes and over 6,000 edges, so we’ll use igraph to make an unlabeled static image first.

# get all of the mentions, quotes, and replies
# put this in a function so we can use it easily with purrr::map
get_interactions <- function(tweets){
  if (!"screen_name" %in% colnames(tweets)) return("")
  message(tweets$screen_name[[1]])
  select(tweets, ends_with("screen_name"), -screen_name) %>% 
  #select(tweets, mentions_screen_name, -screen_name) %>% #ends_with("screen_name")
  unlist() %>%
  tibble(interaction_screen_name = .) %>%
  drop_na() %>%
  group_by(interaction_screen_name) %>%
  summarise(num = n()) %>%
  arrange(desc(num)) %>%
  filter(interaction_screen_name %in% mpp_handles$handle_txt)
}

# get a long list of the number of interactions from each source to each target
mpp_interactions <- mpp_tweets %>% 
  mutate(i = purrr::map(tweets, get_interactions)) %>%
  select(source_name = name, source_screen_name = handle_txt, i, source_party = party)  %>%
  rowid_to_column(var = "source_id") %>%
  mutate(source_id = source_id - 1) %>%
  unnest(cols = c(i)) %>%
  #select(-i) %>% 
  drop_na() %>%
  rename(target_screen_name = interaction_screen_name)

# extract just the individual node ids to add some information back to the interaction data
mpp_node_ids <- mpp_interactions %>%
  select(node_id = source_id, screen_name = source_screen_name, name = source_name, party = source_party) %>%
  distinct()

# join the target data to the interactions data
mpp_interactions <- mpp_node_ids %>%
  rename(target_id = node_id, target_screen_name = screen_name, target_name = name, target_party = party) %>%
  right_join(mpp_interactions, by = c("target_screen_name")) %>%
  select(starts_with("source"), num, starts_with("target")) %>%
  arrange(source_id, desc(num))

# create a function to prep the data and make a directed force graph for MPPs
# with a given minimum number of interactions
make_directed_force_graph <- function(mpp_interactions){
  # create links for a directed force graph
  force_links <- mpp_interactions %>%
    filter(source_id != target_id) %>%
    mutate(num = num / 1000)
  
  # create nodes for a directed force graph
  force_nodes <- mpp_node_ids %>%
    select(name, party) 
  
  # set up a colour scale
  col_scale <- c( "PCP"= "#1A4782",
                  "NDP"=  "#F37021",
                  "LIB"= "#D71920",
                  "GRN" = "#3D9B35",
                  "IND"=   "#CCCCCC")
  
  
  #["PCP", "NDP", "LIB", "GRN", "IND"]
  #["#1A4782","#F37021","#D71920","#3D9B35", "#CCCCCC"]
  
  forceNetwork(Links = force_links, #ff,
               Nodes = force_nodes, #nn,
               Source = "source_id",
               Target = "target_id",
               Value = "num",
               NodeID = "name",
               Group = "party",
               colourScale = JS('d3.scaleOrdinal()
                              .domain(["PCP", "NDP", "LIB", "GRN" ,"IND"])
                              .range(["blue","orange","red","green", "gray"]);'),
               opacity = 1, 
               opacityNoHover = 0.5,
               arrows = TRUE,
               zoom = TRUE)
  
}

# now call the function with our prepared data
#make_directed_force_graph(mpp_interactions)
# create an igraph object from our data
mpp_graph <- mpp_interactions %>%
  select(source_name, target_name, everything()) %>%
  filter(source_name %in% mpp_node_ids$name,
         target_name %in% mpp_node_ids$name) %>%
  igraph::graph_from_data_frame(vertices = select(mpp_node_ids, name, party))

# create a reactive-force layout
set.seed(1234)
fr <- igraph::layout_with_fr(mpp_graph)

# set the colours
V(mpp_graph)$color <- party_colours(V(mpp_graph)$party)

# remove labels, since we can't read them anyway!
# make the plot
plot(mpp_graph, layout = fr, vertex.label = NA,
     main = "Directed Force Graph for MPP Tweet Engagements")

It looks like almost everyone is interacting with everyone else, at least to some degree. I’m presenting this as an unlabeled static image because the labels were unreadable, and with so many nodes and edges it was slow as an interactive simulation. (There’s an interactive one down below, if you’re impatient.)

But there’s some order to this plot. The graph divides roughly into three camps, with one cluster for the governing PCs and one for the opposition NDP. This suggests that MPPs within each camp interact more with each other than with the opposing party, which binds them together. However, NDP MPPs also direct a lot of attention at high-profile PC MPPs. There are quite a few cabinet members right at the border, including Christine Elliott, Lisa MacLeod, Stephen Lecce, and–naturally–Doug Ford. This suggests that although they are more tightly bound to members of their own party, they attract enough attention from the opposition to pull them to the edge of the cluster.

Most Liberals and the one Green member are clustered roughly between the two main camps, suggesting that they tend to interact equally with both groups.

Only Strong Ties

To find some deeper structure, let’s focus on strong ties between people who interact more often. The histogram below shows the distribution of interaction counts on a log-x axis. Most MPPs interact with a given MPP only infrequently–sometimes only once–but there’s a long right tail.

mpp_interactions %>%
  ggplot() +
  geom_histogram(aes(x=num), 
                 bins = 15,
                 fill = "darkgrey") +
  scale_x_continuous(trans = "log", breaks = c(1,10,100,3000)) +
  theme_minimal() +
  labs(x = "Number of Interactions",
       y = "Count",
       title = "Histogram of Interaction Counts between MPPs on Twitter")

This suggests that we could learn more by focusing on these stronger ties, since people who interact more often probably have a more meaningful political relationship.

Eyeballing the histogram, let’s make a network graph only including people who have interacted more than 25 times. Now the thickness of the line depends on how many interactions they’ve had.

This is an interactive simulation using networkD3, so if you’re on a desktop you can use your mouse to pan around, click and drag nodes to watch them wobble around, and adjust the zoom with your mouse wheel.

mpp_interactions %>%
  filter(num > 25) %>%
  make_directed_force_graph()

Now we’re getting somewhere! There are three clusters for the main parties, showing that MPPs interact most with other party members. The Premier, Doug Ford, is now dragged out into centre stage: almost everyone is talking to or about him. PC Cabinet members aren’t nearly so exposed in this graph, meaning a opposition MPPs interact with them only infrequently. And Randy’s off on his own, yelling at Doug.

Focusing on the leaders, Mike Schreiner interacts only with Premier Doug Ford and the NDP leader, Andrea Horwath. I suppose that befits a party leader. Surprisingly, apart from Schreiner, only NDP MPPs interact strongly with Horwath. I’m not sure if the PCs are hands-off as a matter of party discipline or simply through lack of interest.

Summing Up

We’ve learned quite a bit about how Ontario’s MPPs and political parties use Twitter.

Most parties are quite online. More than 80% of each party’s members are active on Twitter, with at least one public post.
Most MPPs post a lot of original content. Most MPPs post slightly more than one original tweet for each retweet. There’s wide variation though, and some post almost no retweets while others post nearly nothing but.
There’s a lot of variation in MPPs’ volume, velocity, and quality. We looked at the distribution and divided them into four categories–Beginners, Influencers, Extremely Online, and Tryhards.
Parties tweet about different things. We found substantial differences in word frequencies between parties, and some of those differences matched their political positions.
Nearly everyone talks to everyone sometimes. The full network graph of all interactions was extremely interconnected, although as we saw there was some structure along party lines.
Connections are generally strongest between parties, but everyone talks about Doug Ford. We found clear party clusters when we looked only at strong network connections, but the Premier was the clear and singular focus of everyone’s online conversations.

We could go into more detail about any of these (or much more besides!) but this feels like a good place to stop. If you have thoughts, feel free to leave a comment or send me an email.

Christopher Belanger, PhD MBA

Data Scientist
Researcher
Policy Expert

My research interests include data science, marketing, and public policy, bridging the quantitative-qualitative divide.