data-infra
data-infra copied to clipboard
Research Question: RT availability inquiry
A key issue we want to help transit providers resolve is GTFS-RT availability. Availability describes, for systems that produce any GTFS-RT, what percent of the time will a rider who wants a real time prediction will actually get one? This is a huge topic, so let's do a first pass and define our next steps.
1: What feeds have valid GTFS-RT?
- what number have a GTFS-RT feed?
- what number have a GTFS-RT feed that returns no errors in the GTFS-RT validator?
- what is the most common validation error per feed (counting each error once per feed)?
- how many feeds have each validation error
- how many unique errors are produced per feed?
- what is the average number of unique errors per feed?
2: How complete are the GTFS-RT feeds that are being published?
- For each transit provider, what percentage of trips that appear in their GTFS Schedule data are also represented in their GTFS-RT data? Put another way, what % of trips that should be in service are actually appearing in the real time feed? Note that updates may happen for part of a trip or none at all.
- For each transit provider, how often are GTFS-RT data being refreshed? What percentage have timestamp updates equal to or less than every 20, 40, 60, 80, 100, 120+ seconds?
- What percentage of GTFS-RT feeds in the state are being updated every 20 seconds or less?
- What feeds, if any, have timestamp update frequencies that change throughout the day by more than 10 seconds?
3: For each provider, we want to know more about the times that updates don't appear in the GTFS-RT feeds.
- For each transit provider, what number and percent of trips do not appear at all in the GTFS-RT feed? This implies that something is wrong with the hardware on the vehicle and it's making no updates. When this occurs, it is helpful to know the trip ID, assigned route, and times of the first and last stops.
- For each transit provider, when real time updates are happening for part of a trip, where are the updates not happening? This implies that the hardware works, but perhaps goes through a cellular dead zone where updates are dropped. When this occurs, it is helpful to know the lat/long where stops are expected to occur but are not, and to see those depicted on a map. It is useful to have agency-level maps (for one feed) and an aggregate map for the State of California. It is also helpful to know if zero updates are getting through or what percentage of expected updates are coming through.