exist [BUG] XQTS tests failing silently

Describe the bug

XQTS tests do not finish in CI but the run is marked as successful. Tests runs cannot be compared as the tests that are reported are just a small fraction of the entire test suite.

This became visible when @dizzzz asked for confirmation that #5946 fixed the issue and also did not introduce regressions.

Expected behavior

XQTS to be marked as successful only if the entire test suite ran and test results can be used to compare compatibility with XQuery 3.1 specification.

To Reproduce

If the above isn't working, please tell us the exact steps you took when you encountered the problem:

checkout develop branch
run mvn clean package

run

exist-xqts/target/exist-xqts-7.0.0-SNAPSHOT-dir/bin/exist-xqts-runner.sh \
  --xqts-version HEAD \
  --output-dir /tmp/xqts-test-output  \
  --exclude-test-case RangeExpr-411d,RangeExpr-409d,RangeExpr-408d,RangeExpr-409c,RangeExpr-408c,GenCompEq-2

go to /tmp/xqts-test-output
only a small portion of more than 31 thousand tests are reported, the set of tests varies between runs

Context (please always complete the following information)

Build: develop HEAD
Java: Liberica 21 and openjdk version "21.0.6" 2025-01-21 LTS
OS: GitHub action runner and OSX15.6.1

Additional context

How is eXist-db installed? N/A
Any custom changes in e.g. conf.xml? N/A

Dec 01 '25 10:12 line-o

Root cause is likely that the exist-xqts-runner was not compiled against a current snapshot build of exist-db. The snapshot build that is used in CI is not working. I was able to get correct results with a new snapshot build.

Dec 01 '25 10:12 line-o

@line-o hmm looking at the xqts workflow config it is indeed only meant to execute a runner and then run a custom comparison xslt on the output for human consumption. These steps succeed, which makes the file green. The results of these actions aren't tied to any assertions that could fail, despite errors already being visible e, g. here.

I m not sure if the test-runner actually sets the correct exist code on failing tests, do you know? We need to make sure it finishes running even with failures, and then tie the results and the comparison to some assertions thing, to get meaningful machine CI output.

The missing SNAPSHOTS are equally a concern, the ci step is similarly green, if it isn't uploading new packages we have a problem. We might have to delete older versions before uploading new ones? GitHubs docs on the maven registry are sadly shorter than this comment.

We should make the job depend on the SNAPSHOT upload, to make sure it uses the correct packages.

Dec 03 '25 10:12 duncdrum

@duncdrum I designed and built the xqts-runner and integrated it with CI, so I can add some clarity here:

It is desigend to be compiled against a specific version of eXist-db as it uses eXist-db as an embedded database, this allows the entire test suite to be run as quickly as possible. There are various version of xqts-runner for various versions of eXist-db - check the build.sbt file to find compatibility details. It's pretty simple to make sure you are using the right version.
It is designed to run all the tests that you request, it doesn't matter if they pass/fail/error/skip, all will be run and the results recorded in a standard JUnit XML report.
I integrated it with CI, which does the following:
1. Runs xqts-runner against the specified qt tests
2. Uploads the JUnit report to CI (for later use)
3. Downloads the JUnit report from a previous run in CI against the same branch (if available)
4. If the previous JUnit report is available, it compares the previous and the current report using XSLT, and produces a summary report
5. The summary report is printed out in CI and uploaded to CI as an artifact
6. Only if there has been an increase in errors or failures then the CI run is marked as failed.

Dec 03 '25 11:12 adamretter

@adamretter thank you this is helpful. The xslt is setting the exit status when the comparison shows a decrease, right? So to avoid the situation we are in now, we would need to assert that the numbers of executed tests is as expected before running the comparison.

We also need to account for the fact when there is nothing to compare due to e.g., network issues.

Ideally we would also get smarted about the failing tests. So that a PR doesn't swap out e.g. 5 new green fn:map tests, for 5 now failing fn:array tests without us noticing.

Dec 03 '25 11:12 duncdrum

So to avoid the situation we are in now, we would need to assert that the numbers of executed tests is as expected before running the comparison.

The number of executed tests should not be changing very often (if at all ever). The QT major spec features that eXist-db implements or doesn't provide have been fixed for a very long time, and it seems unlikely that will ever change. i.e. eXist-db does not implement static analysis, eXist-db does not implement Schema.

We also need to account for the fact when there is nothing to compare due to e.g., network issues.

If it has started running in CI, as the assets are downloaded from CI I doubt you would encounter a network issue, but if you do you can handle that.

Ideally we would also get smarted about the failing tests. So that a PR doesn't swap out e.g. 5 new green fn:map tests, for 5 now failing fn:array tests without us noticing.

That's already implemented somewhere

Dec 03 '25 14:12 adamretter

we have three options

test for the existence of the html folder which will be generated at the end
check the command line output for signals that are Indicators for success or failure
modify the xqts runner script to have a non-zero exit code

Dec 08 '25 18:12 line-o

related https://github.com/eXist-db/exist-xqts-runner/pull/38

Dec 08 '25 22:12 line-o