OCTranspo Route Mapping in R: Part 1

In this post I’m going to explain how to start using R’s tidytransit package to plan trips on OCTranspo, Ottawa’s public transit system. OCTranspo makes a lot of its data available online, but they leave out a key file called transfers.txt that tells the system how you can move from one route to another. You need this file for trip planning, so the first step to planning trips is to generate our own transfers.txt using reasonable assumptions about how transit users can move between route stops on foot.

GTFS: The Data Format

OCTranspo publishes its route information in an open-source format called GTFS, which stands for the General Transit Feed Specification. GTFS has an interesting history, but the short version is that Google Worked with transit operators to develop a standard machine-readable format for transit data so that it could be loaded into Google Maps. And many transit agencies have adopted the standard, so presumably a Heritage Minute is in production.

Getting the Data

To OCTranspo’s credit, the data is very easy to get: here’s a direct link, and here’s a link to OCTranspo’s page with documentation and terms of use.

The Case of the Missing transfers.txt

As might be expected, OCTranspo’s GTFS data has a problem: it ships without a file called transfers.txt, which–helpfully–tells tidytransit how passengers can move from one route to another. Without transfers.txt, our simulated riders are anchored to their bus sets, faces pressed against the glass as they watch other buses come and go, doomed to ride the 88 forever, like an extremely boring chasse-galerie.

The bad news is that the GTFS specifications say that transfers.txt is optional, so OCTranspo isn’t required to provide it. Those same specs also blithely say that “GTFS-consuming applications interpolate transfers based on allowable time and stop proximity,” but unfortunately I haven’t found a way to do that within tidytransit itself. So we’ll have to do it ourselves.

Generating Our Own transfers.txt

I solved the problem in a time-honoured (i.e. effective) but workmanlike (i.e. neither fancy nor elegant) way: nested loops.

  1. Load OCTranspo’s google_transit.zip.
  2. Choose a plausible distance \(D\) people would be willing to walk between transfers.
  3. Choose a plausible minimum time \(t\) it would take for people to make a transfer.
  4. For each transit stop, find any other stops within distance \(D\) and mark a transfer between the two that takes time \(t\).
  5. Save the results in a file called transfers.txt that aligns with the GTFS specs.
  6. Add your new transfers.txt to OCTranspo’s zip file. I tend to rename mine something like google_transit_transfers.zip to make it clear that it’s not OCTranspo’s original file

That’s it. To use your new file, just add it to google_transit.zip and load it using tidytransit::read_gtfs(), and everything should just work.

The code is below, but I’ll also save you the trouble: Here is an OCTranspo transfers.txt generated using the code below on May 19, 2020, assuming you can transfer between all stops within 100m of each other and assuming all transfers take 5 minutes. Note that this file will be stale once OCTranspo updates its routes, but you can use the code to generate a new version.

# This is the code that generates a transfers.txt file from OCTranspo's data.
# You need transfers.txt to do any route planning, since it tells the system how users
# move from one bus/LRT to another. Here we said people can transfer to any stop within
# 50m of another stop, since that seemed like a reasonable walking distance.

# load the OCTranspo schedule data
octranspo_schedule <- read_gtfs(
  "data/google_transit.zip"
)

### WE CAN MAKE A TRANSFERS.TXT TABLE FOR OURSELVES
## THIS TOOK LIKE 2 HOURS TO RUN
plausible_transfers <- tibble()

for (i in 1:nrow(octranspo_stops)){
  print (paste("Stop", i, "/", nrow(octranspo_stops)))
  stop_id <- octranspo_stops$stop_id[i]
  stop_lat <- octranspo_stops$stop_lat[i]
  stop_lon <- octranspo_stops$stop_lon[i]
  for (j in 1:nrow(octranspo_stops)) {
    max_distance = 50 # in meters
    if ( i != j){ # make sure the stops aren't the same
      dest_lon = octranspo_stops$stop_lon[j]
      dest_lat = octranspo_stops$stop_lat[j]
      # calculate distance using a function from the geosphere library
      # https://www.rdocumentation.org/packages/geosphere/versions/1.5-10/topics/distm
      # this is the proper way to do it
      distance <- distm(c(stop_lat, stop_lon), c(octranspo_stops$stop_lat[j], octranspo_stops$stop_lon[j]))
      if (distance < max_distance) {
        transfer_tibble <- tibble(from_stop_id = octranspo_stops$stop_id[i], to_stop_id = octranspo_stops$stop_id[j], transfer_type=2, min_transfer_time=300)
      plausible_transfers <- rbind(plausible_transfers, transfer_tibble)  
      } # end if distance < max_distance
    }
  } # end if j
} # end if i

write_csv(plausible_transfers, "transfers.txt")

Putting it Together: A tiny tidytransit demo

tidytransit is a fantastic R package that makes it easy to do a whole lot of public-transit planning and analysis. Here’s a simple demo using our brand-new transfers.txt to find all the stops reachable within 20 minutes from Sussex and Rideau Falls. (Click the “Code” button below the map to see how it’s done.)

library(tidyverse)
library(tidytransit)
library(leaflet)
library(htmltools)
octranspo <- tidytransit::read_gtfs("google_transit_transfers.zip")

## Choose your trip day, start time, and length
trip_date <- "2020-05-23"     # April 15 is a Wednesday
trip_start_time <- 11         # integer number of hours in 24-hour format
trip_length <- 20

# this filters the stop times for the selected date and times
stop_times <- filter_stop_times(octranspo, trip_date, trip_start_time*3600, trip_start_time*3600+60*trip_length)

# generate travel times from the first stop to all other stops on the network.
first_stop <- "SUSSEX / RIDEAU FALLS"
try(tts <- travel_times(stop_times, first_stop, return_coords = TRUE))

# set up labels
labs <- paste0("Stop Name: ",tts$to_stop_name,"<br/>
                Travel Time (min): ",tts$travel_time/60 ," <br/>
                # of Transfers: ",tts$transfers)
label_fun <- function(x) lapply(labs, FUN = function(x) htmltools::HTML(x))

# make a map
tts %>%
  leaflet() %>%
  addTiles() %>%
  addMarkers(lng=tts$to_stop_lon,
             lat=tts$to_stop_lat,
             label = label_fun())

Some Thoughts on Optimizing

If I were doing this again, I’d look for a few ways to improve on the nested-loop algorithm. E.g.:

  • Take advantage of the symmetry: If A is within x metres of B, then B is also within x metres of A, so there’s no need to recompute.
  • Look at other functions: I suspect that geosphere::distm() is the bottleneck here when calculating the distance between each point. If I were doing this again I’d look at sf::st_within(), which I’ve since used in other contexts and it’s worked quite well.

Acknowledgements

This work was a small part of an MBA project I did for the Ottawa Food Bank, so I need to shout-out my great teammates Katie Carr and Priyaanka Arora and our intrepid prof Peter Rabinovitch.

Christopher Belanger, PhD MBA
Christopher Belanger, PhD MBA
Data Scientist
Researcher
Policy Expert

My research interests include data science, marketing, and public policy, bridging the quantitative-qualitative divide.

comments powered by Disqus