OCTranspo Route Mapping in R: Part 1
In this post I’m going to explain how to start using R’s tidytransit
package to plan trips on OCTranspo, Ottawa’s public transit system. OCTranspo makes a lot of its data available online, but they leave out a key file called transfers.txt
that tells the system how you can move from one route to another. You need this file for trip planning, so the first step to planning trips is to generate our own transfers.txt
using reasonable assumptions about how transit users can move between route stops on foot.
GTFS: The Data Format
OCTranspo publishes its route information in an open-source format called GTFS, which stands for the General Transit Feed Specification. GTFS has an interesting history, but the short version is that Google Worked with transit operators to develop a standard machine-readable format for transit data so that it could be loaded into Google Maps. And many transit agencies have adopted the standard, so presumably a Heritage Minute is in production.
Getting the Data
To OCTranspo’s credit, the data is very easy to get: here’s a direct link, and here’s a link to OCTranspo’s page with documentation and terms of use.
The Case of the Missing transfers.txt
As might be expected, OCTranspo’s GTFS data has a problem: it ships without a file called transfers.txt
, which–helpfully–tells tidytransit
how passengers can move from one route to another. Without transfers.txt
, our simulated riders are anchored to their bus sets, faces pressed against the glass as they watch other buses come and go, doomed to ride the 88 forever, like an extremely boring chasse-galerie.
The bad news is that the GTFS specifications say that transfers.txt
is optional, so OCTranspo isn’t required to provide it. Those same specs also blithely say that “GTFS-consuming applications interpolate transfers based on allowable time and stop proximity,” but unfortunately I haven’t found a way to do that within tidytransit
itself. So we’ll have to do it ourselves.
Generating Our Own transfers.txt
I solved the problem in a time-honoured (i.e. effective) but workmanlike (i.e. neither fancy nor elegant) way: nested loops.
- Load OCTranspo’s
google_transit.zip
. - Choose a plausible distance \(D\) people would be willing to walk between transfers.
- Choose a plausible minimum time \(t\) it would take for people to make a transfer.
- For each transit stop, find any other stops within distance \(D\) and mark a transfer between the two that takes time \(t\).
- Save the results in a file called
transfers.txt
that aligns with the GTFS specs. - Add your new
transfers.txt
to OCTranspo’s zip file. I tend to rename mine something likegoogle_transit_transfers.zip
to make it clear that it’s not OCTranspo’s original file
That’s it. To use your new file, just add it to google_transit.zip
and load it using tidytransit::read_gtfs()
, and everything should just work.
The code is below, but I’ll also save you the trouble: Here is an OCTranspo transfers.txt generated using the code below on May 19, 2020, assuming you can transfer between all stops within 100m of each other and assuming all transfers take 5 minutes. Note that this file will be stale once OCTranspo updates its routes, but you can use the code to generate a new version.
# This is the code that generates a transfers.txt file from OCTranspo's data.
# You need transfers.txt to do any route planning, since it tells the system how users
# move from one bus/LRT to another. Here we said people can transfer to any stop within
# 50m of another stop, since that seemed like a reasonable walking distance.
# load the OCTranspo schedule data
octranspo_schedule <- read_gtfs(
"data/google_transit.zip"
)
### WE CAN MAKE A TRANSFERS.TXT TABLE FOR OURSELVES
## THIS TOOK LIKE 2 HOURS TO RUN
plausible_transfers <- tibble()
for (i in 1:nrow(octranspo_stops)){
print (paste("Stop", i, "/", nrow(octranspo_stops)))
stop_id <- octranspo_stops$stop_id[i]
stop_lat <- octranspo_stops$stop_lat[i]
stop_lon <- octranspo_stops$stop_lon[i]
for (j in 1:nrow(octranspo_stops)) {
max_distance = 50 # in meters
if ( i != j){ # make sure the stops aren't the same
dest_lon = octranspo_stops$stop_lon[j]
dest_lat = octranspo_stops$stop_lat[j]
# calculate distance using a function from the geosphere library
# https://www.rdocumentation.org/packages/geosphere/versions/1.5-10/topics/distm
# this is the proper way to do it
distance <- distm(c(stop_lat, stop_lon), c(octranspo_stops$stop_lat[j], octranspo_stops$stop_lon[j]))
if (distance < max_distance) {
transfer_tibble <- tibble(from_stop_id = octranspo_stops$stop_id[i], to_stop_id = octranspo_stops$stop_id[j], transfer_type=2, min_transfer_time=300)
plausible_transfers <- rbind(plausible_transfers, transfer_tibble)
} # end if distance < max_distance
}
} # end if j
} # end if i
write_csv(plausible_transfers, "transfers.txt")
Putting it Together: A tiny tidytransit demo
tidytransit is a fantastic R package that makes it easy to do a whole lot of public-transit planning and analysis. Here’s a simple demo using our brand-new transfers.txt
to find all the stops reachable within 20 minutes from Sussex and Rideau Falls. (Click the “Code” button below the map to see how it’s done.)
library(tidyverse)
library(tidytransit)
library(leaflet)
library(htmltools)
octranspo <- tidytransit::read_gtfs("google_transit_transfers.zip")
## Choose your trip day, start time, and length
trip_date <- "2020-05-23" # April 15 is a Wednesday
trip_start_time <- 11 # integer number of hours in 24-hour format
trip_length <- 20
# this filters the stop times for the selected date and times
stop_times <- filter_stop_times(octranspo, trip_date, trip_start_time*3600, trip_start_time*3600+60*trip_length)
# generate travel times from the first stop to all other stops on the network.
first_stop <- "SUSSEX / RIDEAU FALLS"
try(tts <- travel_times(stop_times, first_stop, return_coords = TRUE))
# set up labels
labs <- paste0("Stop Name: ",tts$to_stop_name,"<br/>
Travel Time (min): ",tts$travel_time/60 ," <br/>
# of Transfers: ",tts$transfers)
label_fun <- function(x) lapply(labs, FUN = function(x) htmltools::HTML(x))
# make a map
tts %>%
leaflet() %>%
addTiles() %>%
addMarkers(lng=tts$to_stop_lon,
lat=tts$to_stop_lat,
label = label_fun())
Some Thoughts on Optimizing
If I were doing this again, I’d look for a few ways to improve on the nested-loop algorithm. E.g.:
- Take advantage of the symmetry: If A is within x metres of B, then B is also within x metres of A, so there’s no need to recompute.
- Look at other functions: I suspect that
geosphere::distm()
is the bottleneck here when calculating the distance between each point. If I were doing this again I’d look atsf::st_within()
, which I’ve since used in other contexts and it’s worked quite well.
Acknowledgements
This work was a small part of an MBA project I did for the Ottawa Food Bank, so I need to shout-out my great teammates Katie Carr and Priyaanka Arora and our intrepid prof Peter Rabinovitch.