[BUG] XQTS tests failing silently
Describe the bug
XQTS tests do not finish in CI but the run is marked as successful. Tests runs cannot be compared as the tests that are reported are just a small fraction of the entire test suite.
This became visible when @dizzzz asked for confirmation that #5946 fixed the issue and also did not introduce regressions.
Expected behavior
XQTS to be marked as successful only if the entire test suite ran and test results can be used to compare compatibility with XQuery 3.1 specification.
To Reproduce
If the above isn't working, please tell us the exact steps you took when you encountered the problem:
- checkout develop branch
- run
mvn clean package - run
exist-xqts/target/exist-xqts-7.0.0-SNAPSHOT-dir/bin/exist-xqts-runner.sh \ --xqts-version HEAD \ --output-dir /tmp/xqts-test-output \ --exclude-test-case RangeExpr-411d,RangeExpr-409d,RangeExpr-408d,RangeExpr-409c,RangeExpr-408c,GenCompEq-2 - go to /tmp/xqts-test-output
- only a small portion of more than 31 thousand tests are reported, the set of tests varies between runs
Context (please always complete the following information)
- Build: develop HEAD
- Java: Liberica 21 and openjdk version "21.0.6" 2025-01-21 LTS
- OS: GitHub action runner and OSX15.6.1
Additional context
- How is eXist-db installed? N/A
- Any custom changes in e.g.
conf.xml? N/A
Root cause is likely that the exist-xqts-runner was not compiled against a current snapshot build of exist-db. The snapshot build that is used in CI is not working. I was able to get correct results with a new snapshot build.
@line-o hmm looking at the xqts workflow config it is indeed only meant to execute a runner and then run a custom comparison xslt on the output for human consumption. These steps succeed, which makes the file green. The results of these actions aren't tied to any assertions that could fail, despite errors already being visible e, g. here.
I m not sure if the test-runner actually sets the correct exist code on failing tests, do you know? We need to make sure it finishes running even with failures, and then tie the results and the comparison to some assertions thing, to get meaningful machine CI output.
The missing SNAPSHOTS are equally a concern, the ci step is similarly green, if it isn't uploading new packages we have a problem. We might have to delete older versions before uploading new ones? GitHubs docs on the maven registry are sadly shorter than this comment.
We should make the job depend on the SNAPSHOT upload, to make sure it uses the correct packages.
@duncdrum I designed and built the xqts-runner and integrated it with CI, so I can add some clarity here:
- It is desigend to be compiled against a specific version of eXist-db as it uses eXist-db as an embedded database, this allows the entire test suite to be run as quickly as possible. There are various version of xqts-runner for various versions of eXist-db - check the
build.sbtfile to find compatibility details. It's pretty simple to make sure you are using the right version. - It is designed to run all the tests that you request, it doesn't matter if they pass/fail/error/skip, all will be run and the results recorded in a standard JUnit XML report.
- I integrated it with CI, which does the following:
- Runs xqts-runner against the specified qt tests
- Uploads the JUnit report to CI (for later use)
- Downloads the JUnit report from a previous run in CI against the same branch (if available)
- If the previous JUnit report is available, it compares the previous and the current report using XSLT, and produces a summary report
- The summary report is printed out in CI and uploaded to CI as an artifact
- Only if there has been an increase in errors or failures then the CI run is marked as failed.
@adamretter thank you this is helpful. The xslt is setting the exit status when the comparison shows a decrease, right? So to avoid the situation we are in now, we would need to assert that the numbers of executed tests is as expected before running the comparison.
We also need to account for the fact when there is nothing to compare due to e.g., network issues.
Ideally we would also get smarted about the failing tests. So that a PR doesn't swap out e.g. 5 new green fn:map tests, for 5 now failing fn:array tests without us noticing.
So to avoid the situation we are in now, we would need to assert that the numbers of executed tests is as expected before running the comparison.
The number of executed tests should not be changing very often (if at all ever). The QT major spec features that eXist-db implements or doesn't provide have been fixed for a very long time, and it seems unlikely that will ever change. i.e. eXist-db does not implement static analysis, eXist-db does not implement Schema.
We also need to account for the fact when there is nothing to compare due to e.g., network issues.
If it has started running in CI, as the assets are downloaded from CI I doubt you would encounter a network issue, but if you do you can handle that.
Ideally we would also get smarted about the failing tests. So that a PR doesn't swap out e.g. 5 new green fn:map tests, for 5 now failing fn:array tests without us noticing.
That's already implemented somewhere
we have three options
- test for the existence of the
htmlfolder which will be generated at the end - check the command line output for signals that are Indicators for success or failure
- modify the xqts runner script to have a non-zero exit code
related https://github.com/eXist-db/exist-xqts-runner/pull/38