gtfs-validator Best Practice: stop name shouldn't include 'stop' or 'station' unless that is really part of its name (WARNING)

Best Practice: stop name shouldn't include 'stop' or 'station' unless that is really part of its name (WARNING)

Open e-lo opened this issue 3 years ago • 6 comments

By default, stop_name should not contain generic or redundant words like “Station” or “Stop”, but some edge cases are allowed. - When it is actually part of the name (Union Station, Central Station - When the stop_name is too generic (such as if it is the name of the city). “Station”, “Terminal”, or other words make the meaning clear.

Apr 29 '21 05:04 e-lo

Good point. The problems are:

(a) implement that for a long list of languages (well, we can cover some 20-30 top languages) (b) avoid false positives when "Station" or "Stop" is an integral part of the name.

May 02 '21 10:05 aababilov

(a) implement that for a long list of languages (well, we can cover some 20-30 top languages)

We could possibly leverage the newly defined country code CLI parameter to retrieve the main languages spoken in a country. If it is not provided then check the pre-defined languages list and additional languages could be added as needed in the future. But I agree that on a performance point of view it might be a problem.

(b) avoid false positives when "Station" or "Stop" is an integral part of the name.

A possible solution could be to always generate a WARNING and let data producers/consumer decide on whether or not an entry should be fixed.

May 04 '21 17:05 lionel-nj

A possible solution could be to always generate a WARNING and let data producers/consumer decide on whether or not an entry should be fixed.

Perhaps the warning should be threshold-based? My gut tells me that the big issues are when almost every stop has the word "stop" in its name (same with station for rail). Perhaps flag after a certain %?

May 04 '21 18:05 e-lo

Perhaps the warning should be threshold-based? My gut tells me that the big issues are when almost every stop has the word "stop" in its name (same with station for rail). Perhaps flag after a certain %?

@e-lo @barbeau do you think that assessing this threshold from empirical data would be advisable?

May 04 '21 18:05 lionel-nj

do you think that assessing this threshold from empirical data would be advisable?

I think so. Wouldn't this issue be flagged as part of the manual GTFS grading process?

May 04 '21 18:05 barbeau

flagged as part of the manual GTFS grading process?

I think both would be useful, but food for thought:

In our experience the grading process already takes a long time which has caused us to limit the times we do it - we'd like to save the time spent in the grading scheme for things that we actually cannot flag using 'computer brain'
The sampling in the grading scheme may, in fact, not flag egregious use of names
Future development which supports profiles should allow parameterization of a "cut off" for the warning (or silencing it altogether)

May 04 '21 19:05 e-lo

gtfs-validator gtfs-validator copied to clipboard

Best Practice: stop name shouldn't include 'stop' or 'station' unless that is really part of its name (WARNING)

gtfs-validator
gtfs-validator copied to clipboard