Ontario, Meet Your MPPs (Most Prolific Posters)
Analyzing Social Media Use By Elected Officials in Ontario
Introduction
How do politicians in Ontario use Twitter? Who do they talk to, and what do they talk about? In this post, we’ll look at what and who politicians care about by analyzing their public statements on social media.
We’ll look at how often and successfully MPPs tweet. We’ll also do some content analysis to see what different parties focus on. And we’ll do some network analysis to look for patterns in who MPPs talk to online.
Basic Ontario Civics
This post assumes some basic familiarity with Ontario and its political parties, but we’ll do some quick reminders.
- Our elected politicians are called Members of Provincial Parliament (MPPs).
- The Progressive Conservative Party (PCP) holds a majority led by Doug Ford.
- The New Democratic Party (NDP) is the official opposition led by Andrea Horwath.
- The Liberal Party (LIB) has a few seats and is apparently led by Steven Del Duca.
- Mike Schreiner of the Green Party (GRN) holds one seat.
- Once elected, PCP MPP Belinda Karahalios split off to create the New Blue Party (NBP), with one seat.
- There are also a few other, uh, colourful independent MPPs.
Belinda Karahalios appears to have deleted her Twitter account, so her past tweets are unreadable and I’ll exclude her from the analysis for simplicity. But if the New Blue Party picks up steam we could always include her in a future analysis.
Getting the Data
We need two data sets: each MPP’s Twitter handle, then a set of tweets from each MPP.
Twitter Handles
First we need to know each MPP’s Twitter handle. Strangely, the Office of the Legislative Assembly’s MPP contact page doesn’t have this information, but the Horticultural Trades Association does. Thanks guys!
# manually adjust some party affiliations that weren't up to date in the original data
ind_mpps <- c("Roman Baber", "Rick Nicholls", "Jim Wilson", "Randy Hillier")
nbp_mpps <- c("Belinda Karahalios")
url <- "https://horttrades.com/mpp-twitter-handles"
handle_html <- rvest::read_html(url)
# extract the table, remove the header, then tidy them up and expand them
# Roman Baber is now an Independent MPP
mpp_handles <- tibble::tibble(data = rvest::html_table(handle_html)) %>%
slice(-1) %>%
mutate(data = purrr::map(data, rename, name = 1, riding = 2, handle = 3),
data = purrr::map(data, select, name, riding, handle),
data = purrr::map(data, filter, name != "NAME"),
party = c("PCP", "NDP", "LIB", "GRN", "IND")) %>%
unnest(data) %>%
mutate(name = stringr::str_remove(name, "Hon. ")) %>%
filter(is.na(as.numeric(name))) %>% # remove some junk rows that are just numbers
mutate(handle_txt = stringr::str_remove(handle, "@")) %>%
mutate(party = if_else(name %in% ind_mpps, "IND", party),
party = if_else(name %in% nbp_mpps, "NBP", party)) %>%
mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN", "NBP", "IND")))
Here’s the list of MPPs with ridings and Twitter handles according to the HTA as of September 8, 2021.
# custom function to colour the rows to match the party.
row_colours <- function(x, brighten = 0) {
for (i in 1:nrow(for_table)) {
x <- kableExtra::row_spec(x,
row = i,
background = party_colours(for_table$party[[i]], brighten = brighten))
}
x
}
# create a smaller data set just for the table.
# recognize Karahalios as the sole member of the New Blue Party for this table.
for_table <- mpp_handles %>%
select(name, riding, handle, party) %>%
arrange(party, name)
# make a nice-ish table.
# DT::datatable() gave jQuery errors--couldn't figure out why.
for_table %>%
knitr::kable(col.names = c("Name", "Riding", "Twitter Handle", "Party")) %>%
row_colours(brighten = 120) %>%
kableExtra::scroll_box(height = "300px")
Name | Riding | Twitter Handle | Party |
---|---|---|---|
Amarjot Sandhu | Brampton West | @sandhuamerjot1 | PCP |
Amy Fee | Kitchener South – Hespeler | @AmyFeePC | PCP |
Andrea Khanjin | Barrie – Innisfil | @Andrea_Khanjin | PCP |
Aris Babikian | Scarborough – Agincourt | @Aris_Babikian | PCP |
Bill Walker | Bruce – Grey – Owen Sound | @billwalkermpp | PCP |
Billy Pang | Markham – Unionville | @Billy__Pang | PCP |
Caroline Mulroney | York – Simcoe | @C_Mulroney | PCP |
Christina Maria Mitas | Scarborough Centre | @Christina_Mitas | PCP |
Christine Elliott | Newmarket – Aurora | @celliottability | PCP |
Christine Hogarth | Etobicoke – Lakeshore | @CHogarthPC | PCP |
Daisy Wai | Richmond Hill | @Daisy_Wai_PC | PCP |
Daryl Kramp | Hastings – Lennox and Addington | @darylkramp | PCP |
Dave Smith | Peterborough – Kawartha | @DavidSmithMP | PCP |
David Piccini | Northumberland – Peterborough South | @DavidPiccini | PCP |
Deepak Anand | Mississauga – Malton | @DeepakAnandMPP | PCP |
Donna Skelly | Flamborough Glanbrook | @SkellyHamilton | PCP |
Doug Downey | Barrie | @douglasdowney | PCP |
Doug Ford | Etobicoke North | @fordnation | PCP |
Effie Triantafilopoulos | Oakville North – Burlington | @Effie_ONB | PCP |
Ernie Hardeman | Oxford | @erniehardeman | PCP |
Gila Martow | Thornhill | @GilaMartow | PCP |
Goldie Ghamari | Carleton | @gghamari | PCP |
Greg Rickford | Kenora – Rainy River | @GregRickford | PCP |
Jane McKenna | Burlington | PCP | |
Jeff Yurek | Elgin – Middlesex – London | @JeffYurekMPP | PCP |
Jeremy Roberts | Ottawa West – Nepean | @JR_Ottawa | PCP |
Jill Dunlop | Simcoe North | @JillDunlop1 | PCP |
Jim McDonell | Stormont – Dundas – South Glengarry | @JimMcDonell | PCP |
John Yakabuski | Renfrew – Nipissing – Pembroke | @JYakabuskiMPP | PCP |
Kaleed Rasheed | Mississauga East – Cooksville | @krasheedmpp | PCP |
Kinga Surma | Etobicoke Centre | @KingaSurmaMPP | PCP |
Laurie Scott | Haliburton – Kawartha Lakes – Brock | @LaurieScottPC | PCP |
Lindsey Park | Durham | @lparkpc | PCP |
Lisa MacLeod | Nepean | @MacLeodLisa | PCP |
Lisa Thompson | Huron – Bruce | @LisaThompsonMPP | PCP |
Logan Kanapathi | Markham – Thornhill | @LoganKanapathi | PCP |
Lorne Coe | Whitby | @lornecoe | PCP |
Merilee Fullerton | Kanata – Carleton | PCP | |
Michael Parsa | Aurora – Oak Ridges – Richmond Hill | @MichaelParsa | PCP |
Michael Tibollo | Vaughan – Woodbridge | @MichaelTibollo | PCP |
Mike Harris | Kitchener – Conestoga | @mikeharrisjrpc | PCP |
Monte McNaughton | Lambton – Kent – Middlesex | @MonteMcNaughton | PCP |
Natalia Kusendova | Mississauga Centre | @NatKusendova | PCP |
Nina Tangri | Mississauga – Streetsville | @ninatangri | PCP |
Norman Miller | Parry Sound – Muskoka | PCP | |
Parm Gill | Milton | @parmgill | PCP |
Paul Calandra | Markham – Stouffville | @PaulCalandra | PCP |
Peter Bethlenfalvy | Pickering – Uxbridge | @PBethlenfalvy | PCP |
Prabmeet Singh Sarkaria | Brampton South | @PrabSarkaria | PCP |
Randy Pettapiece | Perth – Wellington | @RandyPettapiece | PCP |
Raymond Sung Joon Cho | Scarborough North | @RaymondChoPC | PCP |
Robert Bailey | Sarnia – Lambton | PCP | |
Robin Martin | Eglinton – Lawrence | @RobinMartinPC | PCP |
Rod Phillips | Ajax | @RodPhillips01 | PCP |
Ross Romano | Sault Ste. Marie | @RossRomanoSSM | PCP |
Rudy Cuzzetto | Mississauga – Lakeshore | @RudyCuzzetto | PCP |
Sam Oosterhoff | Niagara West | @samoosterhoff | PCP |
Sheref Sabawy | Mississauga – Erin Mills | @SherefSabawyPC | PCP |
Stan Cho | Willowdale | @StanChoMPP | PCP |
Stephen Crawford | Oakville | @stcrawford2 | PCP |
Stephen Lecce | King – Vaughan | @Sflecce | PCP |
Steve Clark | Leeds – Grenville – Thousand Islands and Rideau Lakes | @SteveClarkPC | PCP |
Sylvia Jones | Dufferin – Caledon | PCP | |
Ted Arnott | Wellington – Halton Hills | @MPPArnottWHH | PCP |
Toby Barrett | Haldimand – Norfolk | @TobyBarrettHN | PCP |
Todd Smith | Bay of Quinte | @ToddSmithPC | PCP |
Victor Fedeli | Nipissing | @VictorFedeli | PCP |
Vijay Thanigaslam | Scarborough – Rouge Park | @VijayThaniMPP | PCP |
Vincent Ke | Don Valley North | @vinentkempp | PCP |
Will Bouma | Brantford – Brant | @WillBoumaBrant | PCP |
Andrea Horwath | Hamilton Centre | @AndreaHorwath | NDP |
Bhutila Karpoche | Parkdale – High Park | @BhutilaKarpoche | NDP |
Catherine Fife | Waterloo | @CFifeKW | NDP |
Chris Glover | Spadine – Fort York | @ChrisGloverMPP | NDP |
Doly Begum | Scarborough Southwest | @DolyBegum | NDP |
Faisal Hassan | York South – Weston | @FaisalHassanNDP | NDP |
France Gélinas | Nickel Belt | @NickelBelt | NDP |
Gilles Bisson | Timmins | @BissonGilles | NDP |
Gurratan Singh | Brampton East | @GurratanSingh | NDP |
Guy Bourgouin | Muchkegowuk – James Bay | @BourgouinGuy | NDP |
Ian Arthur | Kingston and the Islands | @IanArthurMPP | NDP |
Jamie West | Sudbury | @jamiewestndp | NDP |
Jeff Burch | Niagara Centre | @JeffBurch_ | NDP |
Jennifer French | Oshawa | @jennfrench | NDP |
Jennifer Stevens | St. Catharines | @JennieStevens_ | NDP |
Jessica Bell | University – Rosedale | @JessicaBellTO | NDP |
Jill Andrew | Toronto – St. Paul’s | @JILLSLASTWORD | NDP |
Joel Harden | Ottawa Centre | @JoelHardenONDP | NDP |
John Vanthof | Timiskaming – Cochran | @john_vanthof | NDP |
Judith Monteith-Farrell | Thunder Bay – Atiokan | @Judith_NDP | NDP |
Kevin Yarde | Brampton North | @KevinYardeNDP | NDP |
Laura Mae Lindo | Kitchener Centre | @LauraMaeLindo | NDP |
Lisa Gretzky | Windsor West | @LGretzky | NDP |
Marit Stiles | Davenport | @maritstiles | NDP |
Michael Mantha | Algoma – Manitoulin | @M_Mantha | NDP |
Monique Taylor | Hamilton Mountain | @MTaylorNDP | NDP |
Paul Miller | Hamilton East – Stoney Creek | NDP | |
Peggy Sattler | London West | @PeggySattlerNDP | NDP |
Percy Hatfield | Windsor – Tecumseh | @PercyHatfield | NDP |
Peter Tabuns | Toronto – Danforth | @Peter_Tabuns | NDP |
Rima Berns-McGown | Beaches – East York | @beyrima | NDP |
Sandy Shaw | Hamilton West – Ancaster – Dundas | @shaw_sandy | NDP |
Sara Singh | Brampton Centre | @SaraSinghMPP | NDP |
Sol Mamakwa | Kiiwetinoong | @solmamakwa | NDP |
Suze Morrison | Toronto Centre | @SuzeMorrison | NDP |
Taras Natyshak | Essex | @TarasNatyshak | NDP |
Terence Kernaghan | London North Centre | @kernaghant | NDP |
Teresa J. Armstrong | London - Fanshawe | @TArmstrongNDP | NDP |
Tom Rakecevic | Humber River – Black Creek | NDP | |
Wayne Gates | Niagara Falls | @Wayne_Gates | NDP |
Amanda Simard | Glengarry – Prescott – Russell | @ASimardL | LIB |
John Fraser | Ottawa South | LIB | |
Kathleen Wynne | Don Valley West | @Kathleen_Wynne | LIB |
Lucille Collard | Ottawa – Vanier | @LucilleCollard | LIB |
Michael Coteau | Don Valley East | @coteau | LIB |
Michael Gravelle | Thunder Bay – Superior North | @MichaelGravelle | LIB |
Mitzie Hunter | Scarborough – Guildwood | @MitzieHunter | LIB |
Stephen Blais | Orleans | @StephenBlais | LIB |
Mike Schreiner | Guelph | @MikeSchreiner | GRN |
Belinda Karahalios | Cambridge | @KarahaliosPC | NBP |
Jim Wilson | Simcoe – Grey | @mppjimwilson | IND |
Randy Hillier | Lanark – Frontenac – Kingston | @randyhillier | IND |
Rick Nicholls | Chatham – Kent – Leamington | @RickNichollsCKL | IND |
Roman Baber | York Centre | @Roman_Baber | IND |
# clean up
rm(for_table)
There may have been some additional seat changes, so for this preliminary analysis we’re relying on the data I was able to find. (For a published study we’d be a bit more careful here.)
Tweets
I used R’s rtweets package to get as many tweets as possible for each MPP. I’m using the free Twitter API, so we’re limited here to specific information about each MPP’s past 3,200 tweets. It includes tweet texts, favourites, and mentions, so we can do a lot. But it doesn’t include replies, so unfortunately we won’t be able to find which MPPs are most consistently ratioed (which was actually my original research question).
# doing it with a for loop because of rate limits
# set up an empty list
results <- list()
# get the tweets for each MPP one at a time.
# we will bump up against rate limits! so monitor progress, and once it fails,
# wait 15 minutes and then re-start it at the point of failure.
# note that you need your own token for this code to run :)
for (i in 1:nrow(mpp_handles)){
message(i)
result <- NULL
result <- rtweet::get_timeline(user = mpp_handles$handle_txt[[i]],
n = 3200,
token = token)
results[[i]] <- result
}
# set up a new tibble with our results
mpp_tweets <- mpp_handles
mpp_tweets$tweets <- results
# then filter out those with no tweets found, and those who found my bot's tweets
# for some reason! likely due to a weird failure condition due to rate limits
# then make party an ordered factor
# also figure out which tweets are not retweets
mpp_tweets <- mpp_tweets %>%
mutate(num_tweets_found = purrr::map_dbl(tweets, nrow)) %>%#purrr::map_lgl(tweets, function(x) "created_at" %in% colnames(x)),
filter(num_tweets_found > 0) %>%
mutate(my_tweets_found = purrr::map_lgl(tweets, function(x) "OttCovidApts2" %in% unique(x$screen_name))) %>%
filter(!my_tweets_found) %>%
mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN","IND"))) %>%
mutate(tweets_no_rt = purrr::map(tweets, filter, !is_retweet),
num_tweets_no_rt = purrr::map_dbl(tweets_no_rt, nrow))
# save our data here, since it was a pain to collect
save.image(file = "mpp_progress_test.Rdata")
I downloaded these tweets on September 8, 2021, so that’s the time cut-off for this analysis.
Behaviour: How are MPPs Using Twitter?
Let’s look at a few dimensions of online behaviour: how many party members are online, how much original content they post vs. other people’s content, and how often they post and with how much reaction.
Most Online Party
Which party is the most online? We’ll look to see what percentage of elected MPPs have at least one public tweet:
# figure out what % of party members have at least one public tweet
online_stats <- mpp_handles %>%
group_by(party) %>%
filter(party != "NBP") %>%
summarise(num_total = n()) %>%
left_join(mpp_tweets %>%
group_by(party) %>%
summarise(num_online = n()),
by = "party") %>%
mutate(pct_online = num_online / num_total) %>%
mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN","IND")))
# make a static plot
online_plot <- online_stats %>%
ggplot() +
geom_col(aes(x=reorder(party, pct_online),
y = pct_online,
fill = party,
text = sprintf("Party: %s\n%% MPPs with at least one public tweet: %%%.1f",
party,
pct_online*100)),
) +
scale_fill_parties +
theme_minimal()+
scale_y_continuous(label = scales::percent) +
coord_flip() +
labs(title = "Which Party is the Most Online?",
subtitle = "Percent of Ontario's party members with at least one public tweet.",
x = NULL,
y = NULL,
fill = "Party") +
theme(legend.position = "top")
plotly::ggplotly(online_plot, tooltip = "text")
# clean up
rm(online_stats)
rm(online_plot)
All of the independents and the one Green MPP are active on Twitter, but that feels like cheating given the small sample sizes. Of the larger parties, the NDP edges out both the Liberals and the PCs.
But really, all parties are quite online, with over 80% of their members active on Twitter. This suggests that political Twitter could be a rich topic of study, since most MPPs are using it.
Tweets or Retweets?
Let’s look at original tweets vs. retweets, to see whether MPPs are posting their own thoughts or just echoing and amplifying others. An original tweet is written by the MPP in question and posted for the first time, and a retweet is a “a re-posting of a Tweet,” which Twitter says “helps you and others quickly share that Tweet with all of your followers.”
First we can find the percentage of original tweets for each MPP, and plot the results on a histogram to see the distribution of their overall behaviour.
mpp_tweets_orig <- mpp_tweets %>%
mutate(pct_orig = num_tweets_no_rt/num_tweets_found) %>%
select(-tweets, -tweets_no_rt)
# make a static histogram of all MPPs together for % of original tweets,
# then make it interactive
# thanks to this SO answer for the inspiration
# https://stackoverflow.com/questions/61109849/is-there-a-way-to-add-the-bin-range-label-into-the-tooltip-for-a-histogram-using
plot_step1 <- ggplot_build(
mpp_tweets_orig %>%
ggplot() +
geom_histogram(aes(x=pct_orig),
bins = 11,
fill = "darkgrey",
colour = "darkgrey")
)$data[[1]]
plot_step2 <- plot_step1 %>% {
ggplot(data = .,
aes(x=factor(x),
y = count,
text = sprintf("Count: %d\nRange: (%.1f-%.1f%%)", y, round(xmin*100,1), round(xmax*100,1)))) +
scale_x_discrete(labels = sprintf("%.0f-%.0f%%", .$xmin*100, .$xmax*100)) +
geom_bar(stat="identity",
width = 1) +
labs(title = "Histogram: Percentage of Original MPP Tweets",
subtitle = "Most MPPs have a fairly even mix of original tweets and retweets.",
x = "% Original Tweets",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45 )) #, vjust = -.05))
}
# make it interactive
plotly::ggplotly(plot_step2,
tooltip = c("text"))
# clean up
rm(plot_step1)
rm(plot_step2)
Most MPPs have slightly more than 50% original tweets, meaning they write about one original tweet for each retweet. But some do almost nothing but retweet, and some post almost only original content.
Let’s see if there is an obvious difference in tweet/retweet behaviour across the parties with a set of box plots.
# make a set of static boxplots by party for % original tweets
plot_box <- mpp_tweets_orig %>%
ggplot() +
geom_boxplot(aes(x=party, y = pct_orig, fill = party)) +
scale_fill_parties +
theme_minimal() +
labs(title = "Percentage of Tweets that are Original (Not Re-Tweets)",
x = NULL,
y = NULL,
fill = "Party") +
scale_y_continuous(label = scales::percent) +
theme(legend.position = "none")
plotly::ggplotly(plot_box)
The boxes overlap for the three major parties, but we can still see some interesting population differences. The NDP and Conservatives have much longer tails, showing that they have members who do almost nothing but retweet, and members who produce almost exclusively original content. The Liberals have the most balanced approach, with the majority of members producing at least one original tweet for each retweet.
Tweet Volume, Velocity, and Popularity
Let’s look at how often MPPs tweet and how much engagement they get, defined as the number of favourites and retweets. In this analysis we’re looking only at original tweets, so MPPs won’t get credit for retweeting a 20k-like video of someone else’s cat. We want to know how much organic engagement party members get by themselves.
For each active MPP, the following log-log plot shows their average engagements per tweet vs. how often they tweet on average. Moving right means tweeting more often, and moving up means getting more reactions to each tweet. The size of each circle shows how many tweets we found (with an API-imposed maximum of approximately 3,200).
# function to find out the average tweets per day, average engagements per tweet,
# and number of tweets, and filter out a few erroneous rows
get_twitter_stats <- function(z){
if (!"created_at" %in% colnames(z)) return (tibble::tibble(num_tweets = 0, tweets_per_day = 0, engagements_avg = 0))
if ("OttawaCovidAptsDose2" %in% z$source) return (tibble::tibble(num_tweets = 0, tweets_per_day = 0, engagements_avg = 0))
z %>%
mutate(timespan = max(created_at) - min(created_at),
timespan = as.numeric(timespan),
engagements = favorite_count + retweet_count) %>%
select(timespan, engagements) %>%
group_by(timespan) %>%
summarise(num_tweets = n(),
engagements_avg = mean(engagements)) %>%
mutate(tweets_per_day = num_tweets/timespan) %>%
select(-timespan)
}
# get the engagement stats, then unnest them
# also make party a factor with set levels for colouring later
mpp_engagement <- mpp_tweets %>%
mutate(engagement_stats = purrr::map(tweets_no_rt, get_twitter_stats)) %>%
unnest(cols = c(engagement_stats)) %>%
select(-tweets, -tweets_no_rt) %>%
mutate(party = factor(party, levels = c("PCP","NDP","LIB","GRN","IND")))
# set up axes for the plot
x_breaks <- c(.14, .4, 1.4, 4, 14)
y_breaks <- c(4, 15, 60, 240, 700)
# create a static ggplot
plot_engagement <- mpp_engagement %>%
ggplot() +
geom_segment(aes(x= median(x_breaks), xend = median(x_breaks), y = min(y_breaks), yend = max(y_breaks)), colour = "black", size = 1) +
geom_segment(aes(x=min(x_breaks), xend = max(x_breaks), y = median(y_breaks), yend = median(y_breaks)), colour = "black", size = 1) +
annotate("text", x = x_breaks[[2]], y = y_breaks[[2]], label = "Beginners", colour = "#666666", size = 10) +
annotate("text", x = x_breaks[[2]], y = y_breaks[[4]], label = "Influencers", colour = "#666666", size = 10) +
annotate("text", x = x_breaks[[4]], y = y_breaks[[2]], label = "Tryhards", colour = "#666666", size = 10) +
annotate("text", x = x_breaks[[4]], y = y_breaks[[4]], label = "Extremely\nOnline", colour = "#666666", size = 10) +
geom_point(aes(x=tweets_per_day,
y = engagements_avg,
size = num_tweets,
colour = party,
text = sprintf("%s (%s)\nAvg. Tweets/Day: %.2f\nAvg. Engagements/Tweet: %.2f\nTotal Tweets Found: %d",
name, party,
round(tweets_per_day, digits = 2),
round(engagements_avg, digits = 2),
num_tweets)),
alpha = 0.6) +
scale_x_continuous(trans = "log"
, breaks = x_breaks
) +
scale_y_continuous(trans = "log"
, breaks = y_breaks
) +
theme_minimal() +
scale_size(guide = "none") +
labs(title = "Original Tweets: Volume, Velocity, and Engagement",
x = "Tweets per Day",
y= "Avg. Engagements per Tweet",
colour = "Party") +
scale_colour_discrete(type = c("#1A4782","#F37021","#D71920","#3D9B35", "#CCCCCC"))
# make the plot interactive
plotly::ggplotly(plot_engagement,
tooltip = c("text")) %>%
plotly::layout(legend = list(orientation = "h", x = 0.29, y = -0.2))
# clean up
rm(plot_engagement)
Noting again the log axes, we can divide this plot into four quadrants representing four different kinds of Twitter users.
The Beginners in the bottom-left quadrant are modest posters. They don’t tweet much, averaging fewer than 1.4 tweets per day, and people tend not to notice much when they do.
The Extremely Online MPPs in the top-right tweet a lot, and people react when they do. The NDP, PC, and Green leaders are there, and Randy’s out in front tweeting up a storm.
The Influencers in the top-left are a rare group who tweet infrequently but generate a big reaction. I browsed their posts, and it’s not hard to see why: they’re high-profile MPPs who make a small number of issues-focused posts.
Finally, we have the Tryhards in the lower-right who post constantly but don’t get much of a reaction. These MPPs are churning out tweets every day, often in both official languages, without anyone really noticing.
Content: What Are MPPs Tweeting About?
Next, let’s look at the content of these tweets. We’ll use the simple approach of dividing MPPs by party, breaking tweets into individual words, and then calculating each word’s relative frequency by party. This way we can compare word usage across parties of different sizes. If you browse the code, we’ll do this with dplyr and a few helper functions from tidytext.
# get tidytext's stopwords and add a few more
stopwords <- tidytext::stop_words %>%
bind_rows(tibble(word = c("t.co", "amp", "http", "https", "it’s"),
lexicon = "custom"))
# pull out the tweet texts, break into words, remove any numbers, remove
# stopwords, group by party, then count word frequency.
# then get the intra-party relative frequency of each word.
word_freqs <- mpp_tweets %>%
select(party, tweets_no_rt) %>%
mutate(text = purrr::map_chr(tweets_no_rt, function(x) pull(x, text) %>%
unlist() %>%
stringr::str_flatten(collapse = " "))) %>%
select(-tweets_no_rt) %>%
tidytext::unnest_tokens(word, text) %>%
filter(is.na(as.numeric(word))) %>%
anti_join(stopwords, by = "word") %>%
group_by(word, party) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
group_by(party) %>%
mutate(total_words = sum(count),
party_pct = count/total_words) %>%
select(-total_words)
# now set up some words of interest
word_of_interest <- c("onpoli", "ontario",
#"ford", "premier",
"justice", "crime",
"environment", "economy"
)
# filter for words of interest to make a smaller dataset for plotting
word_freqs_of_interest <- word_freqs %>%
filter (word %in% word_of_interest)
The following plot shows word prevalences in original tweets for each party, excluding retweets. I’ve chosen a few keywords for comparison: “Ontario” and “#onpoli”; “justice” and “crime”; and “environment” and “economy.” Note that the y-axes are different for each word, so these plots are only good for relative comparisons.
# make a facet-wrapped set of bar plots for word frequency by party
wordplot <- word_freqs_of_interest %>%
mutate(word = factor(word, levels = word_of_interest)) %>%
ggplot() +
geom_col(aes(x = party,
y = party_pct,
fill = party,
text = sprintf("Word: %s\nParty: %s\nPrevalence in %s Tweets: %%%.2f",
word, party, party,
(party_pct * 100)
))) +
facet_wrap(~word, scales = "free_y", ncol = 2) +
theme_minimal() +
scale_y_continuous(labels = scales::percent) +
scale_fill_discrete(type = c("#1A4782","#F37021","#D71920","#3D9B35", "#CCCCCC")) +
theme(legend.position = "bottom") +
labs(title = "Word Prevalence in Original Tweets by MPPs",
x = NULL,
y = NULL, #"% of Party Words",
fill = "Party"
)
# make the plot interactive
plotly::ggplotly(wordplot, width = 600,
height = 500,
tooltip = c("text")) %>%
plotly::layout(legend = list(orientation = "h", x = 0.2, y = -0.05))
# clean up
rm(wordplot)
With those caveats, we can make a few observations:
- ONPoli vs. Ontario: As suspected, the PCs use the general word “Ontario” much more than the other parties, and the other parties use “#ONPoli” much more than the PCs. This might reflect different purposes–making official statements vs. trying to start an online discussion–or it might reflect different levels of power and comfort. Why use a hashtag when everyone is already listening to you?
- Justice vs. Crime: The PCs use the word “crime” far more than any other party, and the NDP uses the word “justice” far more. This lines up with what we might expect from a right-wing party with a populist leader and a left-wing party. The surprise here is that the Liberals and Greens barely mention crime at all. Given their third-party status, maybe they’re focusing their messaging elsewhere?
- Environment vs. Economy: The details here surprise me. We might expect the NDP’s messaging to focus less on the economy, but I wouldn’t have expected this much of a disparity. We also might have expected the Greens to focus on the environment, but I wouldn’t have expected the PCs to focus on it so much too, or for the NDP to mention it the least out of the four parties.
Without getting into detailed statistics, it looks pretty clear that the parties have different priorities and messaging directives when it comes to their online communications.
Networks: Who Talks to Whom?
Which MPPs interact with which on Twitter? We can explore this question using a network diagram, also called a graph, where each MPP is represented by a circle (or a “node”), and each interaction is represented by a line (or an “edge”). The thickness of each line tells us how often MPPs interact.
We’ll define an interaction as replying to a tweet or mentioning another user in a tweet, since those are the two data points we get from the free Twitter API.
Here we’ll use directed force networks, which incorporate some dynamics: nodes repel each other by default, and attract each other based on the strength of the ties between them. This can help us to see patterns in the data, as the nodes work themselves into a stable configuration. We’ll use the igraph package to make a static picture and networkD3 to make an interactive simulation. For more details I’d suggest Keith McNulty’s free book Handbook of Graphs and Networks in People Analytics.
All Ties
First, we’ll make a directed force network for all interactions between MPPs. Each MPP gets a node, and every pair that has ever interacted gets a line between them. The interactive version runs quite slowly with 108 nodes and over 6,000 edges, so we’ll use igraph to make an unlabeled static image first.
# get all of the mentions, quotes, and replies
# put this in a function so we can use it easily with purrr::map
get_interactions <- function(tweets){
if (!"screen_name" %in% colnames(tweets)) return("")
message(tweets$screen_name[[1]])
select(tweets, ends_with("screen_name"), -screen_name) %>%
#select(tweets, mentions_screen_name, -screen_name) %>% #ends_with("screen_name")
unlist() %>%
tibble(interaction_screen_name = .) %>%
drop_na() %>%
group_by(interaction_screen_name) %>%
summarise(num = n()) %>%
arrange(desc(num)) %>%
filter(interaction_screen_name %in% mpp_handles$handle_txt)
}
# get a long list of the number of interactions from each source to each target
mpp_interactions <- mpp_tweets %>%
mutate(i = purrr::map(tweets, get_interactions)) %>%
select(source_name = name, source_screen_name = handle_txt, i, source_party = party) %>%
rowid_to_column(var = "source_id") %>%
mutate(source_id = source_id - 1) %>%
unnest(cols = c(i)) %>%
#select(-i) %>%
drop_na() %>%
rename(target_screen_name = interaction_screen_name)
# extract just the individual node ids to add some information back to the interaction data
mpp_node_ids <- mpp_interactions %>%
select(node_id = source_id, screen_name = source_screen_name, name = source_name, party = source_party) %>%
distinct()
# join the target data to the interactions data
mpp_interactions <- mpp_node_ids %>%
rename(target_id = node_id, target_screen_name = screen_name, target_name = name, target_party = party) %>%
right_join(mpp_interactions, by = c("target_screen_name")) %>%
select(starts_with("source"), num, starts_with("target")) %>%
arrange(source_id, desc(num))
# create a function to prep the data and make a directed force graph for MPPs
# with a given minimum number of interactions
make_directed_force_graph <- function(mpp_interactions){
# create links for a directed force graph
force_links <- mpp_interactions %>%
filter(source_id != target_id) %>%
mutate(num = num / 1000)
# create nodes for a directed force graph
force_nodes <- mpp_node_ids %>%
select(name, party)
# set up a colour scale
col_scale <- c( "PCP"= "#1A4782",
"NDP"= "#F37021",
"LIB"= "#D71920",
"GRN" = "#3D9B35",
"IND"= "#CCCCCC")
#["PCP", "NDP", "LIB", "GRN", "IND"]
#["#1A4782","#F37021","#D71920","#3D9B35", "#CCCCCC"]
forceNetwork(Links = force_links, #ff,
Nodes = force_nodes, #nn,
Source = "source_id",
Target = "target_id",
Value = "num",
NodeID = "name",
Group = "party",
colourScale = JS('d3.scaleOrdinal()
.domain(["PCP", "NDP", "LIB", "GRN" ,"IND"])
.range(["blue","orange","red","green", "gray"]);'),
opacity = 1,
opacityNoHover = 0.5,
arrows = TRUE,
zoom = TRUE)
}
# now call the function with our prepared data
#make_directed_force_graph(mpp_interactions)
# create an igraph object from our data
mpp_graph <- mpp_interactions %>%
select(source_name, target_name, everything()) %>%
filter(source_name %in% mpp_node_ids$name,
target_name %in% mpp_node_ids$name) %>%
igraph::graph_from_data_frame(vertices = select(mpp_node_ids, name, party))
# create a reactive-force layout
set.seed(1234)
fr <- igraph::layout_with_fr(mpp_graph)
# set the colours
V(mpp_graph)$color <- party_colours(V(mpp_graph)$party)
# remove labels, since we can't read them anyway!
# make the plot
plot(mpp_graph, layout = fr, vertex.label = NA,
main = "Directed Force Graph for MPP Tweet Engagements")
It looks like almost everyone is interacting with everyone else, at least to some degree. I’m presenting this as an unlabeled static image because the labels were unreadable, and with so many nodes and edges it was slow as an interactive simulation. (There’s an interactive one down below, if you’re impatient.)
But there’s some order to this plot. The graph divides roughly into three camps, with one cluster for the governing PCs and one for the opposition NDP. This suggests that MPPs within each camp interact more with each other than with the opposing party, which binds them together. However, NDP MPPs also direct a lot of attention at high-profile PC MPPs. There are quite a few cabinet members right at the border, including Christine Elliott, Lisa MacLeod, Stephen Lecce, and–naturally–Doug Ford. This suggests that although they are more tightly bound to members of their own party, they attract enough attention from the opposition to pull them to the edge of the cluster.
Most Liberals and the one Green member are clustered roughly between the two main camps, suggesting that they tend to interact equally with both groups.
Only Strong Ties
To find some deeper structure, let’s focus on strong ties between people who interact more often. The histogram below shows the distribution of interaction counts on a log-x axis. Most MPPs interact with a given MPP only infrequently–sometimes only once–but there’s a long right tail.
mpp_interactions %>%
ggplot() +
geom_histogram(aes(x=num),
bins = 15,
fill = "darkgrey") +
scale_x_continuous(trans = "log", breaks = c(1,10,100,3000)) +
theme_minimal() +
labs(x = "Number of Interactions",
y = "Count",
title = "Histogram of Interaction Counts between MPPs on Twitter")
This suggests that we could learn more by focusing on these stronger ties, since people who interact more often probably have a more meaningful political relationship.
Eyeballing the histogram, let’s make a network graph only including people who have interacted more than 25 times. Now the thickness of the line depends on how many interactions they’ve had.
This is an interactive simulation using networkD3, so if you’re on a desktop you can use your mouse to pan around, click and drag nodes to watch them wobble around, and adjust the zoom with your mouse wheel.
mpp_interactions %>%
filter(num > 25) %>%
make_directed_force_graph()
Now we’re getting somewhere! There are three clusters for the main parties, showing that MPPs interact most with other party members. The Premier, Doug Ford, is now dragged out into centre stage: almost everyone is talking to or about him. PC Cabinet members aren’t nearly so exposed in this graph, meaning a opposition MPPs interact with them only infrequently. And Randy’s off on his own, yelling at Doug.
Focusing on the leaders, Mike Schreiner interacts only with Premier Doug Ford and the NDP leader, Andrea Horwath. I suppose that befits a party leader. Surprisingly, apart from Schreiner, only NDP MPPs interact strongly with Horwath. I’m not sure if the PCs are hands-off as a matter of party discipline or simply through lack of interest.
Summing Up
We’ve learned quite a bit about how Ontario’s MPPs and political parties use Twitter.
- Most parties are quite online. More than 80% of each party’s members are active on Twitter, with at least one public post.
- Most MPPs post a lot of original content. Most MPPs post slightly more than one original tweet for each retweet. There’s wide variation though, and some post almost no retweets while others post nearly nothing but.
- There’s a lot of variation in MPPs’ volume, velocity, and quality. We looked at the distribution and divided them into four categories–Beginners, Influencers, Extremely Online, and Tryhards.
- Parties tweet about different things. We found substantial differences in word frequencies between parties, and some of those differences matched their political positions.
- Nearly everyone talks to everyone sometimes. The full network graph of all interactions was extremely interconnected, although as we saw there was some structure along party lines.
- Connections are generally strongest between parties, but everyone talks about Doug Ford. We found clear party clusters when we looked only at strong network connections, but the Premier was the clear and singular focus of everyone’s online conversations.
We could go into more detail about any of these (or much more besides!) but this feels like a good place to stop. If you have thoughts, feel free to leave a comment or send me an email.