gtfs-validator icon indicating copy to clipboard operation
gtfs-validator copied to clipboard

Modify DecreasingOrEqualStopTimeDistanceNotice

Open isabelle-dr opened this issue 3 years ago • 6 comments

The spec in stop_times.shape_distance_traveled mentions (link):

Values used for shape_dist_traveled must increase along with stop_sequence; they must not be used to show reverse travel along a route.

Currently, the validator treats both decreasing values and equal values of shape_distance_traveled as an error. We should differentiate decreasing values (error) and equal values (warning). The same discussion happened for the rule DecreasingOrEqualShapeDistanceNotice, which was separated into different rules.

Additional context in this comment.

isabelle-dr avatar Jan 06 '22 12:01 isabelle-dr

Hi @isabelle-dr,

I'm writing here since it's related to both stop time distance and shapes distance validations.

Sorry I skipped reviewing shapes related validation. Now while trying to sync a new version of the gtfs-validator. I encountered a lot of GTFS feeds that trigger EqualShapeDistanceDiffCoordinatesNotice which is ERROR. I decided to take a look at the spec and code, here are the issues that I found and what we may need to do with them

  • Specification doesn't seem to say what units must be used, it means that we can't effectively write a validation rule to that uses both Lat, Lon coordinates and distance (about this below). We might need to provide an expectation that it must be meters or kilometers or miles, but different providers and consumers must use the same units. Alternatively we can't have validation rules that use both.
  • Since we don't know distance units we can't make right decision on the precision. For instance we're given GPS coordinates up to a centimeter and distance units are in kilometers with 2-digit precision, it means that distance values will be the same if stops are within 10 meters. It means we either must also incorporate distance units into validation, otherwise, it's very noisy.
  • Another observation is that during the implementation there were 175 dataset failing with the new error, which is more than 10% of all datasets. This looks like a lot and there are two questions
    • The change should probably be blocked since it's more than 1% of failed datasets?
    • It's a lot of failed datasets and probably some investigation must've been done, however, I didn't see any comments about why so many failed datasets are considered to be fine

asvechnikov2 avatar Apr 08 '22 05:04 asvechnikov2

  • Another observation is that during the implementation there were 175 dataset failing with the new error, which is more than 10% of all datasets. This looks like a lot and there are two questions

    • The change should probably be blocked since it's more than 1% of failed datasets?
    • It's a lot of failed datasets and probably some investigation must've been done, however, I didn't see any comments about why so many failed datasets are considered to be fine

I don't have any background info about this guideline. My 2 cents on this: If 10% of all datasets have properties that we decide to consider an error, then we shouldn't block this change, because the goal of this tool is to nudge agencies to provide better feeds, right?

derhuerst avatar Apr 26 '22 20:04 derhuerst

  • Another observation is that during the implementation there were 175 dataset failing with the new error, which is more than 10% of all datasets. This looks like a lot and there are two questions

    • The change should probably be blocked since it's more than 1% of failed datasets?
    • It's a lot of failed datasets and probably some investigation must've been done, however, I didn't see any comments about why so many failed datasets are considered to be fine

I don't have any background info about this guideline. My 2 cents on this: If 10% of all datasets have properties that we decide to consider an error, then we shouldn't block this change, because the goal of this tool is to nudge agencies to provide better feeds, right?

Absolutely! The tricky part is have properties that we decide to consider an error though. It might so happen that the decision doesn't take all possible edge cases into account the way it happened here. So we need to make sure to document decisions explaining why particular encountered cases (or a hypothetical ones) are considered wrong and must be fixed. Having a rule that invalidates 10% of datasets provides really good reason to make sure that there are no valid use cases that we will consider invalid by the rule.

To clarify my opinion, I don't think it should be "If more than 1% of datasets failing then block changes." (IRC it was this way when the acceptance test was introduced) it should be "If more than 0% of datasets failing let's make sure that we know that all failures are expected and document this."

asvechnikov2 avatar Apr 28 '22 01:04 asvechnikov2

Hello @asvechnikov2, thanks for opening-up this discussion! The work in PR #1083 and discussions in issue #1070 were a lot 🙈. I would like to share additional context. Here is an attempt to answer the three issues you flagged.

The acceptance test and the "175 failing datasets"

it should be "If more than 0% of datasets failing let's make sure that we know that all failures are expected and document this."

I agree. The process here has not been clearly defined and we have an issue open to improve this test. Since we implemented the architecture needed to run this test using all the datasets we have on the mobility database, we mostly used it to solve false positive errors (PR #1082, PR #1062) The work in PR #1083 is also a fix for false positives errors. We didn't get any additional failing datasets with it, because we replaced an existing error with 2 new errors and 1 warning, which led to approx. 100 datasets going from invalid to valid. The initial version of these stats is here.

  • Before PR #1083: the v3.0.0 release & before
Notice Severity # of datasets that trigger it
DecreasingOrEqualShapeDistanceNotice ERROR 277
  • After PR #1083: the v3.0.1 release
Notice Severity # of datasets that trigger it
DecreasingShapeDistanceNotice ERROR 2
EqualShapeDistanceDiffCoordinatesNotice ERROR 175
EqualShapeDistanceSameCoordinatesNotice WARNING 272

The acceptance test shows the list of datasets that get an additional error (regardless if previously valid or invalid). Since EqualShapeDistanceDiffCoordinatesNotice was a new notice, the test flagged all the datasets that trigger it (but they were all previously invalidated by DecreasingOrEqualShapeDistanceNotice).

🤝🤝 -- I would love to collaborate with you on improving these tests -- 🤝🤝

The spec interpretation and the use of lat/lon for validation The spec mentions

Values used for shape_dist_traveled must increase along with stop_sequence; they must not be used to show reverse travel along a route.

Using our current interpretation, this means that equal values in shape_dist_traveled = ERROR, regardless of the lat/lon coordinates. But since we noticed that the majority of the datasets that were triggered by the previous DecreasingOrEqualShapeDistanceNotice had actually same values AND same lat/long values, it didn't seem fair to treat them as invalid data, because it seemed to be more a scheduling software precision issue (or something else?) than bad data. We inspected data that got same values for shape_dist_traveled but different coordinates, and they all seemed like true positives: actual problems that people should fix.

This is why we created the two EqualShapeDistanceDiffCoordinatesNotice (error) and EqualShapeDistanceSameCoordinatesNotice (warning). We discussed using lat/long values for validation but we didn't endup doing it. The only thing we do is to look if they are equal or not between two records.

So the validator is actually slightly less strict than the spec here, and I am assuming you are integrating v3.0.1 from an earlier version before DecreasingOrEqualShapeDistanceNotice was implemented. (😅 this could have been a 1h presentation, I hope this is clear!)

isabelle-dr avatar May 04 '22 21:05 isabelle-dr

I also see the difference in how people use this validator, between what @derhuerst was saying:

the goal of this tool is to nudge agencies to provide better feeds, right?

And @asvechnikov2 that relies on this tool as a data consumer:

Having a rule that invalidates 10% of datasets provides really good reason to make sure that there are no valid use cases that we will consider invalid by the rule.

I think we already have datasets that are treated as invalid by this canonical validator, but that should be available to users on Google Maps.

One way to go with this is to keep certain notices that should be errors but invalidate many datasets as warnings in the canonical validator for a certain period of time & inform users on when it will be upgraded, so producers fix the feeds.

And another way is to implement custom validation so Google can share their criteria if different. And in this option, you would communicate "in 6month, this rule will be matching the canonical severity" so producers fix their feeds.

(I think we should do a bit of both) Again, would love to collaborate to make this happen 🤝

isabelle-dr avatar May 04 '22 21:05 isabelle-dr

And another way is to implement custom validation so Google can share their criteria if different. And in this option, you would communicate "in 6month, this rule will be matching the canonical severity" so producers fix their feeds.

While I see the need for this, I think we should try to avoid it at all costs.

Doing this essentially forks the standard into several variants, just like it has already happened in practice with different GTFS documentation websites and different interpretations of the spec. The goal should be to move towards one concise spec that everyone follows; Only this way, a standard like GTFS can fulfil its potential.

derhuerst avatar May 23 '22 07:05 derhuerst