gtfs-validator icon indicating copy to clipboard operation
gtfs-validator copied to clipboard

Best Practice: stop name shouldn't include 'stop' or 'station' unless that is really part of its name (WARNING)

Open e-lo opened this issue 3 years ago • 6 comments

By default, stop_name should not contain generic or redundant words like “Station” or “Stop”, but some edge cases are allowed. - When it is actually part of the name (Union Station, Central Station - When the stop_name is too generic (such as if it is the name of the city). “Station”, “Terminal”, or other words make the meaning clear.

e-lo avatar Apr 29 '21 05:04 e-lo

Good point. The problems are:

(a) implement that for a long list of languages (well, we can cover some 20-30 top languages) (b) avoid false positives when "Station" or "Stop" is an integral part of the name.

aababilov avatar May 02 '21 10:05 aababilov

(a) implement that for a long list of languages (well, we can cover some 20-30 top languages)

We could possibly leverage the newly defined country code CLI parameter to retrieve the main languages spoken in a country. If it is not provided then check the pre-defined languages list and additional languages could be added as needed in the future. But I agree that on a performance point of view it might be a problem.

(b) avoid false positives when "Station" or "Stop" is an integral part of the name.

A possible solution could be to always generate a WARNING and let data producers/consumer decide on whether or not an entry should be fixed.

lionel-nj avatar May 04 '21 17:05 lionel-nj

A possible solution could be to always generate a WARNING and let data producers/consumer decide on whether or not an entry should be fixed.

Perhaps the warning should be threshold-based? My gut tells me that the big issues are when almost every stop has the word "stop" in its name (same with station for rail). Perhaps flag after a certain %?

e-lo avatar May 04 '21 18:05 e-lo

Perhaps the warning should be threshold-based? My gut tells me that the big issues are when almost every stop has the word "stop" in its name (same with station for rail). Perhaps flag after a certain %?

@e-lo @barbeau do you think that assessing this threshold from empirical data would be advisable?

lionel-nj avatar May 04 '21 18:05 lionel-nj

do you think that assessing this threshold from empirical data would be advisable?

I think so. Wouldn't this issue be flagged as part of the manual GTFS grading process?

barbeau avatar May 04 '21 18:05 barbeau

flagged as part of the manual GTFS grading process?

I think both would be useful, but food for thought:

  1. In our experience the grading process already takes a long time which has caused us to limit the times we do it - we'd like to save the time spent in the grading scheme for things that we actually cannot flag using 'computer brain'
  2. The sampling in the grading scheme may, in fact, not flag egregious use of names
  3. Future development which supports profiles should allow parameterization of a "cut off" for the warning (or silencing it altogether)

e-lo avatar May 04 '21 19:05 e-lo