gtfs-validator
gtfs-validator copied to clipboard
Best Practice: stop name shouldn't include 'stop' or 'station' unless that is really part of its name (WARNING)
By default, stop_name should not contain generic or redundant words like “Station” or “Stop”, but some edge cases are allowed. - When it is actually part of the name (Union Station, Central Station - When the stop_name is too generic (such as if it is the name of the city). “Station”, “Terminal”, or other words make the meaning clear.
Good point. The problems are:
(a) implement that for a long list of languages (well, we can cover some 20-30 top languages) (b) avoid false positives when "Station" or "Stop" is an integral part of the name.
(a) implement that for a long list of languages (well, we can cover some 20-30 top languages)
We could possibly leverage the newly defined country code CLI parameter to retrieve the main languages spoken in a country. If it is not provided then check the pre-defined languages list and additional languages could be added as needed in the future. But I agree that on a performance point of view it might be a problem.
(b) avoid false positives when "Station" or "Stop" is an integral part of the name.
A possible solution could be to always generate a WARNING
and let data producers/consumer decide on whether or not an entry should be fixed.
A possible solution could be to always generate a WARNING and let data producers/consumer decide on whether or not an entry should be fixed.
Perhaps the warning should be threshold-based? My gut tells me that the big issues are when almost every stop has the word "stop" in its name (same with station for rail). Perhaps flag after a certain %?
Perhaps the warning should be threshold-based? My gut tells me that the big issues are when almost every stop has the word "stop" in its name (same with station for rail). Perhaps flag after a certain %?
@e-lo @barbeau do you think that assessing this threshold from empirical data would be advisable?
do you think that assessing this threshold from empirical data would be advisable?
I think so. Wouldn't this issue be flagged as part of the manual GTFS grading process?
flagged as part of the manual GTFS grading process?
I think both would be useful, but food for thought:
- In our experience the grading process already takes a long time which has caused us to limit the times we do it - we'd like to save the time spent in the grading scheme for things that we actually cannot flag using 'computer brain'
- The sampling in the grading scheme may, in fact, not flag egregious use of names
- Future development which supports profiles should allow parameterization of a "cut off" for the warning (or silencing it altogether)