gtfs-validator icon indicating copy to clipboard operation
gtfs-validator copied to clipboard

Best Practice: Include agency_id in routes.txt (WARNING)

Open e-lo opened this issue 3 years ago • 2 comments

Including agency_id in routes.txt is a best practice and should generate a warning even though it is not required if there is only one.

Implementatin thoughts: Rather than returning here in this override, generate a WARNING.

https://github.com/MobilityData/gtfs-validator/blob/52434454b747bb4085038f27563e98fee6018ca5/main/src/main/java/org/mobilitydata/gtfsvalidator/validator/TripAgencyIdValidator.java#L46-L59

e-lo avatar Apr 28 '21 17:04 e-lo

Where is this best practice coming from and what is the rationale? Does that also include a requirement to provide agency_id in agency.txt even for a single agency?

To be honest, I fully support your proposal. I am just looking for a strong rationale to convince the transit feeds to fix that warning.

aababilov avatar May 02 '21 10:05 aababilov

Where is this best practice coming from and what is the rationale?

I think that @e-lo refers to this part of the best practices:

Field Name Recommendation
agency_id Must be included if it is defined in agency.txt.

Does that also include a requirement to provide agency_id in agency.txt even for a single agency?

According to best practices for agency.txt, yes:

Field Name Recommendation
agency_id Should be included, even if there is only one agency in the feed. (See also recommendation to include agency_id in routes.txt and fare_attributes.txt)

lionel-nj avatar May 04 '21 17:05 lionel-nj

Just to clarify, I see a recommendation in Best Practices for agency.txt to include agency_id even if only one agency is specified (link), but I don't see similar text for routes.txt. Did that text get dropped at some point? Does it need to be added?

bdferris-v2 avatar Oct 27 '22 16:10 bdferris-v2

See my comment here 🙃

isabelle-dr avatar Oct 27 '22 16:10 isabelle-dr

@isabelle-dr Hello! I am taking a look at this. It seems like the most comprehensive solution would be:

  1. generate a warning if there is a single agency in agency.txt but agency_id is not provided
  2. generate a warning if agency_id is missing in routes.txt and a single agency is defined in agency.txt
  3. generate an error if agency_id is missing in routes.txt but is defined in agency.txt (with either one or multiple agencies defined in agency.txt)

Am I missing anything?

briandonahue avatar Jan 19 '23 16:01 briandonahue

@briandonahue I'd clarify that for (3) that I don't believe it's an error for agency_id to be missing from routes.txt if only a single agency is defined in agency.txt with a corresponding agency_id value. It's only an error in the multi-agency case.

bdferris-v2 avatar Jan 19 '23 22:01 bdferris-v2

@bdferris-v2 OK, I was looking at this comment (don't think I have access to see the original repo page)

I think that @e-lo refers to this part of the best practices:

Field Name Recommendation

agency_id Must be included if it is defined in agency.txt.

Does that also include a requirement to provide agency_id in agency.txt even for a single agency?

But I can remove that validation.

briandonahue avatar Jan 20 '23 14:01 briandonahue

@briandonahue thanks for working on this issue. You have access to the repo, but this link is broken because we removed this part of the Best Practices, and I believe this was a mistake (see my comment here).

  1. generate a warning if there is a single agency in agency.txt but agency_id is not provided

Correct, and this comes from the agency.txt in the GTFS Best Practices. I recommend using the @recommended annotation in the schema file, just like what Kevin did in #1156, which will lead to the missing_recommended_field notice. The "References" and "Affected files" sections of the documentation for this notice will have to be updated, because currently it's only used on fields in feed_info.txt

  1. generate a warning if agency_id is missing in routes.txt and a single agency is defined in agency.txt

Correct, although there is no mention of this anymore in the GTFS Best Practices for the reason I shared above. I will open a PR in the Best Practices repo shortly to fix that. Meanwhile, I propose you still add the validation, refer to the routes.txt when you update RULES.md, and we will just wait to have this fixed on the Best Practice repo before we merge yours.

  1. generate an error if agency_id is missing in routes.txt but is defined in agency.txt (with either one or multiple agencies defined in agency.txt)

This is a good catch! It's a portion of the spec we didn't have before. As @bdferris-v2 flagged, we have an error only if we have multiple agencies. This refers to the routes.txt specification.

I hope this is clear, let us know if you'd like to discuss how to implement 2 and 3, pick the names for the notices, etc.

isabelle-dr avatar Jan 23 '23 20:01 isabelle-dr

@isabelle-dr I've created a draft PR, thank you for the pointers on where to update the documentation, I need to do that before it's ready for review.

I recommend using the @recommended annotation in the schema file, just like what Kevin did in https://github.com/MobilityData/gtfs-validator/pull/1156

Would this be confusing since it is recommended in the case of a single agency, but required when more than one? I see a @ConditionallyRequired attribute on there now... I'm still learning, so thank you for your explanations!

briandonahue avatar Jan 24 '23 22:01 briandonahue

it is recommended in the case of a single agency, but required when more than one

I think you might be mixing the routes.txt and agency.txt files? They both have agency_id.

Here is another way to see this:

  1. agency.agency_id should be included if agency.txt is provided (WARNING - and I suggest we re-use MissingRecommendedField with the same logic we applied for feed_info.txt in #1156)
  2. routes.agency_id should be included if agency.txt contains at least 1 record, or 1 value for `agency_id (WARNING)
  3. routes.agency_id must be included if agency.txt contains more than one value for agency_id (ERROR)

Writing this, I realize that we have different options for how to implement 2 and 3. Option 1 - create a new MissingAgencyId notice, which could be an ERROR or a WARNING based on the conditions. ⚠️ I am not in favor of this option, because we have a longer-term vision of separating the notice from the severity, and allowing users to specify a different severity for a rule (ex: Cal-ITP might want to require routes.agency_id at all times, so they would want this to be an error regardless of the number of agencies defined in agency.txt).

Option 2 - create two new custom rules, similarly than what we have for feed_expiration_date_7_days and feed_expiration_date_30_days

Option 3 - find a way to trigger MissingRecommendedField (which is a WARNING) for case 2 and MissingRequiredField for case 3(which is an ERROR).

Let me know if you see this in any other way, happy to take this discussion to your PR :)

isabelle-dr avatar Jan 25 '23 15:01 isabelle-dr

@isabelle-dr I've removed custom errors, deferring to the existing MissingRecommendedField and MissingRequiredField depending on single or multiple agencies scenarios. I was looking at updating the Best Practices documentations, and added some questions here

briandonahue avatar Jan 25 '23 19:01 briandonahue

@briandonahue not sure where you are at with this but Alexei implemented a similar logic here, based on the following statement in the spec:

fare_attributes.agency_id is Required if multiple agencies are defined in agency.txt

isabelle-dr avatar Feb 06 '23 17:02 isabelle-dr