ππ βΒ OTP/Delays Analysis Use Case Profile Development
Overview
The On-Time Performance (OTP) and Delays Analysis use case was identified as the highest priority profile during the April 2025 Contributors Group meeting, receiving 9 votes. This profile would define the requirements, methodologies, and best practices for using TIDES data to analyze transit delays and on-time performance at a granular, actionable level. As the first TIDES use case profile, this development work will help shape a key component of the eventual TIDES Implementation Guide documentation deliverable.
Scope
This profile will focus on:
- Vehicle delay analysis: Identifying locations, patterns, and magnitudes of delays
- Passenger-weighted delay analysis: Quantifying impact on riders
- Cross-modal analysis: Comparing delay patterns across different transit modes
Future extensions may include:
- Spatial visualization: Mapping delays to street networks (considered as a potential extension)
Required TIDES Tables
-
vehicle_locations: core for tracking vehicle movements and calculating delays and provides raw data for speed, location, and timestamp analysis
-
trips_performed: links vehicle movements to specific trips and provides context for scheduled versus actual service
-
stop_visits: captures arrival and departure times at stops; essential for stop-level delay analysis
-
passenger_events: provides ridership context for passenger-weighted delay analysis, and helps quantify the impact of delays on passengers
Working Group Tasks π β
-
Initial Research and Scoping (By May 24)
- Review/consider existing delay analysis methodologies to identify common metrics and approaches
- Document agency variations in delay analysis
- Clearly define core requirements vs. future extensions
-
Field Requirements Definition (By May 31)
- Identify essential fields for core delay analysis
- Document field relationships and dependencies
- Define minimum data quality requirements
- Outline potential extension fields for future consideration
-
Implementation Patterns Documentation (By June 7)
- Document common implementation approaches for core functionality
- Identify technical challenges and solutions
- Develop example implementation patterns
- Note where street network integration might enhance future implementations
-
Draft Profile Development (By June 12)
- Compile research into draft profile
- Develop validation approach
- Document extension points, particularly for spatial visualization
-
Community Review Preparation (By June 12)
- Prepare presentation for Contributors Group
- Identify key discussion points
- Develop examples for illustration
Discussion Questions π π€
- How should we balance standardization with flexibility in delay analysis methodologies?
- What are the minimum data requirements for meaningful delay analysis?
- What validation rules would ensure data quality for this use case?
- How can we accommodate both bus and rail delay analysis in a consistent framework?
- For future extensions: What approach to street network integration would be most valuable given the challenges raised in the May Contributors meeting?
Recent Meeting Insights π
April Contributors Group Meeting
- Requires fine-grain AVL data for effective analysis
- Discussion of potential future extensions included:
- Stop segment analysis (comparing to fastest observed times)
- Street segment histograms (10-meter segments with travel speeds)
- Shared Streets tool mentioned as a reference for potential street network integration
May Contributors Group Meeting
- Discussion highlighted implementation challenges that would need to be addressed:
- Changing street network identifiers over time
- Selection of base map (OSM vs. alternatives)
- Representation of multiple location values:
- Observed location
- Mapped location
- Linear referencing of the mapped location
- Street network matching discussed as a future extension opportunity
Implementers Group Meetings
- Schedule linking complexity identified as a challenge
- Weekly GTFS updates create issues for historical linkage
- Vendor data mapping issues noted
- Vehicle locations often lack schedule information
- Balance needed between data completeness and practical utility
- Need for use-case-specific field requirements to help prioritize implementation
Future Extension Opportunities
The May Contributors Group discussion on extension mechanisms identified street network integration as a significant future opportunity:
-
Street Network Integration (Future Direction)
- Extensions for linking vehicle positions to street segments
- Fields for street segment identifiers
- Fields for relative position along segments
- Useful for interpolating intermediate locations between pings
-
Location Representation Options (For Future Consideration)
- Observed location (raw GPS)
- Mapped location (snapped to network)
- Linear referencing of the mapped location
Use Case Profile Working Group Members
- Laurie Merrell (Jarvus) @lauriemerrell
- Ian Detamore (PENNDOT) @idetamore
- Joey Reid (Metro Transit) @botanize
- [Seeking additional participants]
This issue builds on the April Contributors Group prioritization of TIDES use case profiles to be developed, and addresses implementation challenges identified in the Implementers Group meetings. While we have incorporated insights from the May Contributors Group discussion on extension mechanisms, street network integration may be best suited as a future extension rather than a core requirement for the initial profile development. But please, let's discuss this!
I've used at least 2.5 variations of delay analysis
- stop-segment delay: coarse spatial resolution, stop-segments can span 100 meters to many kilometers. a. Calculate travel times between origin and destination stops (typically stop arrival - previous stop departure). b. stop-segment delay mapped to streets Assign the segment delay to the streets between the stop segments, allows combining multiple routes that operate on the same streets but share either the origin or destination stop, but not both.
- street-segment delay: arbitrary resolution Match AVL to 10-m segments of the street network, calculate delay in each segment. We've used both sharedstreets-matcher and an in-house process to project vehicle locations onto the street network and aggregate spatially and temporally.
Define delay
All of these require defining "delay". It's tempting to use scheduled travel-time as the baseline for stop-segment delay, but those are typically not nearly precise enough as a measure of delay, and are often intentionally inaccurate to avoid vehicles holding in awkward places. Instead I tend to use something closer to a definition of delay used for highways. I calculate "freeflow" travel-time as the average travel-time of the shortest 2% of travel times observed (all times-of-day, all days-of-week).
Once you know how fast it's possible for a vehicle to travel the segment, anything slower is delay.
Some people use freeflow speeds specific to each temporal window, for example, Weekday AM Peak delay would be travel time exceeding the Weekday AM Peak freeflow travel-time. However, I believe this is a mistake, as the fastest speed observed during the AM Peak is likely to represent some level of delay beyond what is technically possible with transit advantages (separated guideway, bus lanes, signal priority), and what is likely to be observed during early morning Sunday service for example.
Stationarity
Freeflow speeds are likely to be fairly stable, but lane closures or street reconfigurations can cause short- or long-term changes to freeflow speeds for a stop or street segment. I see two basic options. First, define the analysis window to be a period of relative consistency. We typically calculate delay for a quarter and assume freeflow speeds are stable within that period. If there are known issues, like lane closures or street reconstructions those can be accounted for in the analysis window. The other option would be to use a weighting scheme like a rolling window, or exponential decay to weight more recent observations more than older observations.
Spatial and temporal resolution
Every vehicle location observation can be assigned to arbitrary time-periods, service windows or schedule types, e.g., Weekday-Midday, allowing observations to be aggregated in arbitrary time periods.
Spatial resolution depends on the source of travel time. For stop-segment delay, spatial resolution is limited to the segment itself, which can vary greatly. For street-matched or projected observations the spatial resolution can be more arbitrary (1 m, 10 m, 100 m), but ultimately the precision depends on the reporting rate (and accuracy) of vehicle positions. When instantaneous speed is available it can be assigned to the projected location directly, but that results in a type of sampling at the street segment level. The average speed between observations (e.g., the length of the line-segment connecting vehicle positions divided by the time between observations) could be applied to each street segment bin between the origin location and destination location. More sophisticated techniques could account for instantaneous and segment speeds together.
In addition to vehicle delay we use a couple of more standard metrics of schedule performance, schedule adherence (often called on-time performance) and headway adherence. These are so simple that the biggest point of contention is usually around the standards of performance, for example, we consider a vehicle on-time if it's between 59 seconds early and 5 minutes late, and headways ok if they are under 140% of scheduled headway. But there's a lot of room for argument there.
I'd definitely defer to Joey for the data science & agency nuance here, some stray thoughts from my relatively-outsider perspective:
- As Joey noted, seems like there might be a need to measure delay performance purely against a schedule vs. measure delay performance based on headways -- these seem like substantially different methodologies to support
- For measuring vs. schedule, big question we're running into is timepoints -- not necessarily standard to indicate timepoints in GTFS, so may need other schedule sources for this (use of other sources for this is presumably already established within agencies, but TIDES docs kind of emphasize using GTFS for schedule info)
- As we've discussed offline, wondering if we think it's possible to do passenger weighting with just stop_visits level data? Per discussion not all sources will have passenger-event level granularity
- Things that seem like they need to be parameterizable (if someone were building a tool to calculate this using TIDES):
- Thresholds what counts as "delayed", either in minutes/seconds or (see below) perhaps in headway multiples
- Time of day bucketing behavior (AM / PM peak definition)
- Also curious about service types or categories that people might want to specify for reporting purposes -- TIDES doesn't necessarily natively support much additional/custom labeling (maybe guidance on whether this is done within TIDES models or downstream...?), but I'm thinking about things like express vs. local routes (noted in MTA dashboards linked below) that might be grouped together for metric reporting
Looking at a few public resources:
- CTA publicly reports:
- Rail:
- Cases where headway is 2x or 3x scheduled
- Delay of 10 min or more
- Bus:
- % of buses with less than 60 sec between them
- Times when interval between buses is >2x scheduled and also >15 min
- Rail:
- MTA publicly reports:
- Subway:
- "Customer Journey Time Performance is the estimated percentage of customersβ trips that are completed within 5 minutes of the scheduled time. Customer Journey Time is measured from 6 AM to 11 PM on weekdays, with the peak period including 7 AM to 10 AM and 4 PM to 7 PM." Open data
- They also seem to publish a raw count of delayed trains by category -- open data
- Bus: "Bus Customer Journey Time Performance (CJTP) estimates the percentage of customer trips with a total travel time within 5 minutes of the scheduled time. Bus Customer Journey Time Performance is equivalent to the percentage of customer trips with Additional Bus Stop Time (ABST) + Additional Travel Time (ATT) less than 5 minutes. Like ABST and ATT, CJTP is estimated for each individual bus a customer uses in their journey, not all buses in their journey combined." Open data
- Commuter rail (LIRR + Metro North) just seems to have counts of delayed trains
- Subway:
- For measuring vs. schedule, big question we're running into is timepoints -- not necessarily standard to indicate timepoints in GTFS, so may need other schedule sources for this (use of other sources for this is presumably already established within agencies, but TIDES docs kind of emphasize using GTFS for schedule info)
Good point, schedule adherence is only defined at timepoints. We've been using the timepoint field of stop_times.txt for this, as it is part of the spec. I'd strongly suggest anyone attempting to do this kind of work in TIDES to implement the appropriate field in the relevant schedule data standard instead of adding another field to TIDES.
- As we've discussed offline, wondering if we think it's possible to do passenger weighting with just stop_visits level data? Per discussion not all sources will have passenger-event level granularity
Yeah, that's what we do. Just use the origin stop's calculated departure load for the passenger component.
If you want to calculate passenger delay from vehicle location records (as opposed to stop-segments from stop_visits), you could still use the stop-visits level APC and join to locations on vehicle and timestamp.
Things that seem like they need to be parameterizable (if someone were building a tool to calculate this using TIDES):
- Thresholds what counts as "delayed", either in minutes/seconds or (see below) perhaps in headway multiples
Seconds would best match the existing datetime definitions like actual_departure_time, which are often stored as 64bit integers of seconds or milliseconds. Seconds also helps a little with the inclusive or exclusive range issue, e.g., does one minute early mean <=60 seconds or <60 seconds?
- Time of day bucketing behavior (AM / PM peak definition)
- Also curious about service types or categories that people might want to specify for reporting purposes -- TIDES doesn't necessarily natively support much additional/custom labeling (maybe guidance on whether this is done within TIDES models or downstream...?), but I'm thinking about things like express vs. local routes (noted in MTA dashboards linked below) that might be grouped together for metric reporting
I think both of these could be built off of GTFS time_frames.txt or could be handled by a couple of dimension tables, e.g., dim_time and dim_date.
dim_time could have fields like time (integer seconds), time period, rush, etc., while dim_date would have things fields like service date, service type name, booking or service pick name, etc. The dimension tables are probably easier to implement and use, at some cost of duplicating some of the information in GTFS Fares v2.
Hi all, I'm Eric Dasmalchi and I've worked on Caltrans' California Transit Speed Maps for the last few years.
Right now the speed maps site uses hybrid segments: usually stop to stop but every 1km where stops are farther than 1km apart to provide useful granularity for rural or freeway services.
Most of our stakeholder engagement has been around finding transit priority opportunities, internally within Caltrans in relation to our state highway system but also with MPOs and local jurisdictions since we aim to analyze as much of California as possible. Given that we've focused on speeds, speed variability, and frequency over delay.
Schedule-based delay is not well suited to our kinds of analysis: if a transit agency sets a realistic (slow) schedule since the bus is always stuck in traffic and therefore has minimal schedule delay... we still want to get the bus out of that traffic.
With standardized ridership data not yet broadly available, we usually wait to add estimates of ridership once we've narrowed our focus down to a subset of corridors. We have used a fixed reference speed and/or percentage speed increase assumption to estimate delay, but comparison to free flow speed is probably a better approach.
Ultimately, it would be great to understand the many factors that influence transit speeds (stop spacing, payment/all door boarding/dwell times, intersection density...) and be able to model expected speed improvements from transit priority in a location-specific way. We are pursuing that through a research process .
Overall it seems like this is on the right track, happy to talk more if helpful.
I'd like to +1 to all the people who have noted that there is a significant difference between analyzing how close a vehicle's travel time is to the expected travel time (e.g. schedule delay and headway variation) versus the theoretical minimum travel time (@botanize 's definition of delay c.t. free flow).
Many of the major capital investments are expected to improve both βΒ but it's important to distinguish.