gtfs-validator icon indicating copy to clipboard operation
gtfs-validator copied to clipboard

Improve acceptance rules

Open maximearmstrong opened this issue 2 years ago • 7 comments

Feature request

It would be beneficial to add new rules to the acceptance tests. Currently, two rules are tested:

  • Does the validator that uses the new code contribution generate more new errors than allowed?
  • Were there too many corrupted sources/datasets during validation? Although these rules are fundamental to the acceptance tests, other aspects are not considered: disappearing errors or warnings, unusual info notices, etc.

Is your feature request related to a problem? Please describe. Specifically in #1188, the acceptance tests passed, even though the initial implementation prevented the validator from checking most of the validation rules. The reports generated did not contain any errors or warnings. Also, the reports contained the unknown_file info notice for required files such as stop_times.txt. Although the acceptance tests were not originally designed to cover this use case, it would be nice if they did.

Proposed solution First, I think adding two new rules would be helpful:

  • Are too many errors disappearing?
  • Are too many warnings disappearing?

Another potential additional rule would be

  • Do the reports contain the unknown_file notice for the required files? since this is an indicator of a problem, but it is very specific.

maximearmstrong avatar Aug 09 '22 15:08 maximearmstrong

Thanks for opening this! There is another issue about a next iteration on the acceptance tests: #1076. This one isn't a duplicate, both flag different improvements to be made.

isabelle-dr avatar Aug 10 '22 12:08 isabelle-dr

I was looking at the acceptance test implementation and I noticed that there is apparently code in the acceptance_test.yml to post a comment to the pull request with acceptance test status. I've never actually seen this in action before. Does it actually work?

Asking because I was taking an initial stab at improving the logic here and was thinking of refactoring some things, but I wanted to make sure I understood the existing behavior first.

bdferris-v2 avatar Aug 10 '22 16:08 bdferris-v2

Oh, you're right. It looks like it's working when PRs are made from this repo (see here and here), but not from forks.

Looking more closely to the run logs, it looks like the problem is related to the GitHub token. Here (from our repo) vs here (from a fork).

This is because the token is accessed via a secret. To the best of my knowledge, GitHub disables using secrets in workflows originating from forks. @isabelle-dr How valuable do you think the comment is? Could we refactor the acceptance tests without it?

maximearmstrong avatar Aug 10 '22 18:08 maximearmstrong

Well, we can definitely improve the acceptance test without fixing the comment issue.

That said, I was looking at this method in the comparator and I think it could definitely be improved a bit. I was also thinking of combining the acceptance_report.json and sources_corruption_report.json into one combined report (not sure why they need to be separate?). This would have some implications for the commenting code but i'm sure I could keep it working the same either way (even if it doesn't actually work for me :)

bdferris-v2 avatar Aug 10 '22 18:08 bdferris-v2

I agree with you - I thought the same thing looking at the checkRuleValidity method. About combining the reports. I don't think they need to be separate, that's how it was before I started working on the validator, so merging them works for me. @isabelle-dr do you have any objection?

maximearmstrong avatar Aug 10 '22 19:08 maximearmstrong

How valuable do you think the comment is?

I'd say pretty valuable, the initial idea was to have something user-friendly to see the impact of a PR, and how this impact might change as the contributor is sending commits. It makes it easy to trace stuff back. See it kind of in action in #1083. The scope of this test changed as we were working on it (see this comment that followed a discussion with @asvechnikov2) and I see value in having in this validator:

  1. tests to make sure everything is working properly
  2. the ability to analyze the impact of a new piece of code on production data - this is more a functionality to me whereas the above is about safety & stability.

These two "tests" might contain similar checks (e.g. see if many additional / disappearing errors) but we won't want to run them at the same time, and their output might not be used in the same way. The first one is simply blocking a merge, whereas the second could be used ad-hoc by the developer, and the results would be looked at in the review process with follow-up actions (inform agencies and consumers concerned, make it a major release, etc).

No strong opinion on combining the report, this seems like a good idea.

isabelle-dr avatar Aug 10 '22 22:08 isabelle-dr

Also related: https://github.com/MobilityData/gtfs-validator/issues/1233

isabelle-dr avatar Aug 10 '22 22:08 isabelle-dr