ultimate
ultimate copied to clipboard
Skip unsupported regression tests
Currently many regression tests still fail (see #611). Therefore the Jenkins status ("unstable") is not a good indicator. However, some of the tests are expected to fail (and always have), because they only pass for some settings/toolchains (e.g. due to overapproximation).
This PR provides the possibility to mark those tests as skipped after running them. Therefore we store all tests (consisting of file, settings, toolchain) along with a verdict in a separate file. If a tests fails with this verdict, it is marked as skipped, otherwise as failed.
There are still some open points to discuss:
- Currently the files containing the verdicts need to be in the same folder as the settings or toolchains (and not the file itself!) to parse the skipped-file beforehand.
- In the current version the verdicts are the results as strings (e.g. TIMEOUT, UNKNOWN, EXCEPTION_OR_ERROR,...). This has two possible issues. First we need to convert these Strings to enums, which might fail. Second these categories might not be as precise as wanted. Therefore we could use the description (e.g.
Unable to prove ... Reason: overapproximation of ...
) as verdict in this skipped-file. - And of course, if it is reasonable to mark the given tests as skipped.
When you run a test suite with skipped tests, can you infer from the test suite result alone which tests where skipped? For example by looking into the TEST-*.xml
?
@schuessf Is there a reason why this isnt merged?
Okay, I revisited this PR after a long time 🙃 and updated a few files. These are the downsides and open questions:
- What should be name of the files that specify tests that are skipped (currently:
.skip
)? Should it be a fixed name or do we want to allow some pattern? - Currently the files have to be in the same folder as the settings or toolchains, but not the file itself. This allows to parse the file only once beforehand and not for every test iself.
- The verdict in the .skip-files can currently only be the result (e.g. UNKNOWN, TIMEOUT, ...), but not a substring of the more detailed error message. This would allow us a more precise check, whether the test should be indeed marked as skipped (or if an unexpected error occured).
If we can agree on these questions, this should be ready to be merged -- and finally all our tests might pass 😉
I would say:
- Keep
.skip
, do not allow different names. Easier to find withfind
and the like. - Keep it in the same folder as settings or toolchains. Just one concept, would be more complex if we had different ones. And parsing once is of course also important :)
- It would be nice to allow regexes here; then we could also exclude all results with
.*
.
I updated the code to use regular expressions that also match on the message and not only the result (7b67a7e), without changing the .skip files yet. If we also want to use regular expressions in the .skip files, I think the current space-separated format is not optimal (since there might be also spaces inside the regexes). Maybe we can change the format also to be more "category-based", to avoid duplicating the same regexes.
Yeah, using them in the .skip files would be needed. Category-based would also be nice. Perhaps use YAML or something for which we already have a parser?
Just to bump, not that it lays dormant for too long :)