data-infra icon indicating copy to clipboard operation
data-infra copied to clipboard

Detailed GTFS-RT completeness data requests

Open evansiroky opened this issue 2 years ago • 1 comments

User story / feature request

As a transit data analyst, I would like to have the ability to quickly calculate various metrics of "GTFS-RT completeness", so that I can coordinate with transit agencies to try to help them make their realtime data complete

Acceptance Criteria

It would be really cool to be able to have some columns in some tables accessible via metabase that allow for querying the following (items are in rough order of desirability as of the creation of this issue):

  1. Number of distinct timestamps per trip per day in both the trip updates and vehicle positions
  2. Number of distinct lat/lon combos per trip per day in the vehicle positions
  3. Average percent of remaining stops in a trip that have a StopTimeUpdate per trip per day
  4. An easy way to get the scheduled duration of a trip without having to query a stop times table
  5. A way to calculate the observed duration of a trip per day
  6. Number of distinct timestamps per message header per day in both the trip updates and vehicle positions

Notes

I can break this up into multiple issues if needed.

I have already made a query for distinct lat/lon combos per trip per day in the vehicle positions, but the query is kind of slow and can't be easily expanded across multiple days. Also, I have made a similar query for distinct lat/lon combos for a single route.

This issue potentially supersedes #2524.

evansiroky avatar May 05 '23 00:05 evansiroky

Number of distinct timestamps per trip per day in both the trip updates and vehicle positions

This is available in fct_observed_trips, we have a variety of different timestamp counts by trip.

Number of distinct lat/lon combos per trip per day in the vehicle positions

This is recoverable from fct_vehicle_locations, could potentially be added to fct_observed_trips.

Average percent of remaining stops in a trip that have a StopTimeUpdate per trip per day

We don't currently do stop level comparisons between trips in schedule and RT, we could but it would be performance intensive.

An easy way to get the scheduled duration of a trip without having to query a stop times table

This can be calculated from fct_scheduled_trips, first and last stop times are included there

A way to calculate the observed duration of a trip per day

Related to the question above, we don't yet have this... Somewhat complicated to know when the trip "starts" from RT data (since RT data may be available well before the vehicle is in progress), related to interpolating its departure from first stop? I believe @owades may be working on this at some point.

Number of distinct timestamps per message header per day in both the trip updates and vehicle positions

I am not sure what message header means here -- fct_observed_trips has count of distinct header timestamps per trip?

lauriemerrell avatar Aug 28 '23 18:08 lauriemerrell