gtfs-validator
gtfs-validator copied to clipboard
Fast Travel Between Far Stops not using `stops.stop_timezone`
Bug report
Describe the bug
Fast Travel Between Far Stops
warning incorrectly given where stops change time zones because the field stops.stop_timezone
is not taken into account.
This is occurring roughly here:
https://github.com/MobilityData/gtfs-validator/blob/92c336760d6f662d72102f34b871a0f16d3c5f36/main/src/main/java/org/mobilitydata/gtfsvalidator/validator/StopTimeTravelSpeedValidator.java#L164
For example: Amtrak's Southwest Chief Route travels between Kingman AZ (KNG) and Needles CA (NDL) where the trip 3:
- Departs KNG 12:28 AM Mountain Time
- Arrives NDL 12:26 AM Pacific Time
The stops.stop_timezones
are correctly set, but are not being accounted for in the triggering of this warning, causing the feed to be rejected by Google's validation process.
How we reproduce the bug Steps to reproduce the behaviour:
- Run validator on https://content.amtrak.com/content/gtfs/GTFS.zip
Expected behaviour
Should not trigger a Fast Travel Between Far Stops
warning between KNG and NDL (among others)
Observed behaviour
Triggers a Fast Travel Between Far Stops
warning between KNG and NDL (among others)
Screenshots:
Hi @e-lo!
Thanks for flagging this! As far as I understand stops.stop_timezones
mustn't be considered when using times from stop_times
since all values must be provided in agency.agency_timezone
. Here's a quote from the specification (please see description stop_timezone
for stops.txt). Please let me know if I misunderstood anything!
...
If stop_timezone values are provided, the times in [stop_times.txt]
(https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#stop_timetxt)
should be entered as the time since midnight in the timezone specified by
agency.agency_timezone. This ensures that the time values in a trip
always increase over the course of a trip, regardless of which timezones the trip crosses.
Also, in the Amtrak's Southwest Chief Route
example if stop_times.txt
contain data like
stop_id, arrival_time, departure_time
KNG, <not important>, 00:28:00
NDL, 00:26:00, <not important>
There must be an ERROR telling about wrong time order.
Hi @isabelle-dr,
It looks like there's a confusing usage of should/must/may keywords in the spec. Do you know if there's plan to format them according to RFC 2119?
🤔 🤔 🤔 Hmmm...
I think this conflicts with the descriptions of times in stop_times.txt
, which specify local time in the sentence discussing overnight travel?
For times occurring after midnight on the service day, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.
Regardless, the spec needs to clarify in stop_times.txt
whether it is agency time or local time for the stop which is specified.
Amtrak has implemented it as local time.
- What are other multi-timezone agencies doing currently?
- How are GTFS consumers interpreting it currently?
@Cristhian-HA should there be an issue in /google/transit
to take care of this clarification?
I think this conflicts with the descriptions of times in
stop_times.txt
, which specify local time in the sentence discussing overnight travel?
Note that this also conflicts with the general definition of a GTFS Time value, which is defined as time since "noon - 12h": (https://gist.github.com/derhuerst/574edc94981a21ef0ce90713f1cff7f6.
Hello all,
I wanted to reach out here regarding next steps -- do we believe this is something that should be amended on Google's/MobilityData's end? If not, what do we think is the best advice for Amtrak to no longer see these errors? We will also need to update documentation to make it clearer.
Thanks!
Hello, thanks for flagging this @e-lo!
- To this specific issue
I could not reproduce the issue. I ran the validator on the dataset given in this PR, I got 12 fast_travel_between_far_stops
but no mention of stop NDL
, or trip 3
. See validation results here, using our first iteration on an HTML output (coming soon!).
Additionally, the pairs of stops were all in the same time zone in stops.stop_timezone
:
- WMN & GOV, MCA & GVI, SAY & WLC are all LA
- TBR & TRR are both NYC
UPDATE 29 Apr: this is because the dataset was modified
~~@asvechnikov2 do you have a later version of this rule at Google, that would lead to different results?~~
~~@e-lo are you sure the dataset you linked here is the one that contains the issue? I didn't find trip 3
or stop NDL
in this dataset.~~
- To the spec interpretation:
2.1. The use of must/should and what the severity of this problem should be
It looks like there's a confusing usage of should/must/may keywords in the spec. Do you know if there's plan to format them according to RFC 2119?
This has been done last year (PR#227). ...But, the severity of this rule isn't based on spec use of must or should, because there is no mention of speed in the spec or best practices (right?). There is a discrepancy between the severity on this repo (warning) and the severity at Google (critical error). @asvechnikov2 is this more of a Google specific criteria or it should be added in the spec with a "must" statement?
2.2. The interpretation of the value in stops.stops_timezone
Not my area of expertise byt yes that's confusing. If stops.stop_timezone
isn't there to be used to get times in trip planning, why is it there then? 🙈
We should definitely have an issue opened on google/transit for this.
I could not reproduce the issue. I ran the validator on the dataset given in this PR...I got 12 fast_travel_between_far_stops but no mention of stop NDL
That is b/c in order to have their GTFS file validate in Google, etc (in the validator) they had to remove the whole Southwest Chief line in their published GTFS.
Submitted this issue: https://github.com/google/transit/issues/322
Thank you! 🙏
This has been done last year (https://github.com/google/transit/pull/277). ...But, the severity of this rule isn't based on spec use of must or should, because there is no mention of speed in the spec or best practices (right?). There is a discrepancy between the severity on this repo (warning) and the severity at Google (critical error). @asvechnikov2 is this more of a Google specific criteria or it should be added in the spec with a "must" statement?
For severity there are two types of notices
- Vehicle travels fast - just a warning, won't affect results too much.
- Vehicle travels fast, but also across big distance (usually many km). This most likely will break (or affect in some significant way) routing, if there's a train that travels from one side of a city (or even worse country) to another in just couple of minutes or seconds then this train will dominate all routing results. In some cases it would make sense instead of taking 4 hours train to wait 3 hours 58 minutes and take 2 min train.
RE: must/should. should
assumes there are some other options available, however, there are none. Everyone must
use agency_timezone
otherwise producers/consumers will refer to different times and show users wrong/misleading information.
2.2. The interpretation of the value in stops.stops_timezone Not my area of expertise byt yes that's confusing. If stops.stop_timezone isn't there to be used to get times in trip planning, why is it there then? 🙈
We have a limited use of it, not sure if we should use it more or less though. It could be interesting to know how other consumers use it.
Thanks for the replies, all!
From the Proposed Solution https://github.com/google/transit/issues/322, it is clear now that Google uses the agency timezone -- therefore from Google's end we agree that Elizabeth's proposal to better clarify the definitions is the best next step (updating on Google's end would require more time and discussion, and we would need to leave partners like Amtrak hanging until it is done).
Are there any concerns in updating the definition? If not, I'm actually not familiar with the process here -- @Elizabeth Sall @.***> are you able to make initiate that change?
Thanks!
On Wed, May 4, 2022 at 5:16 AM asvechnikov2 @.***> wrote:
This has been done last year (google/transit#277 https://github.com/google/transit/pull/277). ...But, the severity of this rule isn't based on spec use of must or should, because there is no mention of speed in the spec or best practices (right?). There is a discrepancy between the severity on this repo (warning) and the severity at Google (critical error). @asvechnikov2 https://github.com/asvechnikov2 is this more of a Google specific criteria or it should be added in the spec with a "must" statement?
For severity there are two types of notices
- Vehicle travels fast - just a warning, won't affect results too much.
- Vehicle travels fast, but also across big distance (usually many km). This most likely will break (or affect in some significant way) routing, if there's a train that travels from one side of a city (or even worse country) to another in just couple of minutes or seconds then this train will dominate all routing results. In some cases it would make sense instead of taking 4 hours train to wait 3 hours 58 minutes and take 2 min train.
RE: must/should. should assumes there are some other options available, however, there are none. Everyone must use agency_timezone otherwise producers/consumers will refer to different times and show users wrong/misleading information.
2.2. The interpretation of the value in stops.stops_timezone Not my area of expertise byt yes that's confusing. If stops.stop_timezone isn't there to be used to get times in trip planning, why is it there then? 🙈
We have a limited use of it, not sure if we should use it more or less though. It could be interesting to know how other consumers use it.
— Reply to this email directly, view it on GitHub https://github.com/MobilityData/gtfs-validator/issues/1110#issuecomment-1117093241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSI6V2KJ65UJQCQYVS4GMDVII57PANCNFSM5PSMNSCA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
- • *Kevin Perkins
- • *Product Solutions Engineer *• * Google, Inc.
- •* @.***
It's now clarified in the spec that stop_times.arrival_time
and stop_times.departure_time
use agency.agency_timezone
(many thanks to those who contributed and @e-lo for raising this!)
Based on the discussion here, it looks like the spec clarification was the only outstanding step and there are no further changes to be made to the validator. @e-lo I'll leave this issue open for a few days before closing in case there's anything else you or others would like to raise related to this.