data-infra
data-infra copied to clipboard
Detailed GTFS-RT completeness data requests
User story / feature request
As a transit data analyst, I would like to have the ability to quickly calculate various metrics of "GTFS-RT completeness", so that I can coordinate with transit agencies to try to help them make their realtime data complete
Acceptance Criteria
It would be really cool to be able to have some columns in some tables accessible via metabase that allow for querying the following (items are in rough order of desirability as of the creation of this issue):
- Number of distinct timestamps per trip per day in both the trip updates and vehicle positions
- Number of distinct lat/lon combos per trip per day in the vehicle positions
- Average percent of remaining stops in a trip that have a StopTimeUpdate per trip per day
- An easy way to get the scheduled duration of a trip without having to query a stop times table
- A way to calculate the observed duration of a trip per day
- Number of distinct timestamps per message header per day in both the trip updates and vehicle positions
Notes
I can break this up into multiple issues if needed.
I have already made a query for distinct lat/lon combos per trip per day in the vehicle positions, but the query is kind of slow and can't be easily expanded across multiple days. Also, I have made a similar query for distinct lat/lon combos for a single route.
This issue potentially supersedes #2524.
Number of distinct timestamps per trip per day in both the trip updates and vehicle positions
This is available in fct_observed_trips, we have a variety of different timestamp counts by trip.
Number of distinct lat/lon combos per trip per day in the vehicle positions
This is recoverable from fct_vehicle_locations, could potentially be added to fct_observed_trips.
Average percent of remaining stops in a trip that have a StopTimeUpdate per trip per day
We don't currently do stop level comparisons between trips in schedule and RT, we could but it would be performance intensive.
An easy way to get the scheduled duration of a trip without having to query a stop times table
This can be calculated from fct_scheduled_trips, first and last stop times are included there
A way to calculate the observed duration of a trip per day
Related to the question above, we don't yet have this... Somewhat complicated to know when the trip "starts" from RT data (since RT data may be available well before the vehicle is in progress), related to interpolating its departure from first stop? I believe @owades may be working on this at some point.
Number of distinct timestamps per message header per day in both the trip updates and vehicle positions
I am not sure what message header means here -- fct_observed_trips has count of distinct header timestamps per trip?