multicoretests
multicoretests copied to clipboard
Add information to logs with a custom runner
Add a bash script to use as runner for our test suite
This script adds information to the logs:
- explicit names of the test run when they start
- some explicit messages for crashes (SIGSEGV, SIGBUS, ...)
- and, maybe most importantly, anchors in CI logs, so that the main webpage contains the most important information and direct links to precise points of interest in the log
It is written as a bash script but making sure it can be used as a runner also on Windows CI (notably by commenting all ends of lines, without which it fails with errors about '\r's that don't exist...)
Example of result of the changes: the webpage for this run shows directly that we got a SEGV in Ephemeron on trunk (windows). Or this run with SIGABRT in buffers on trunk on macos.
I'm a bit hesitant about this one. As someone who has scanned CI logs for the past two weeks I certainly follow you on the benefits!
On the negative side,
- this adds back in a custom runner now that we finally got rid of the clunky
check_error_count - it is in bash - which is untyped, has lots of quirks (witness the newline issue), and has caused problems for people before (I think there was even an incident at JaneStreet)
I'm wondering if this is something
- that could/should be handled in
duneor inQCheck_base_runnerinstead - or something we could handle with a custom runner?
I've been thinking of the latter to avoid printing counterexamples twice (with and without return values)...
I do agree it’d be nicer to go without bash. I had tried first to see if it would be possible to do in dune, but I couldn’t figure it out (see the fact that the error code shown in windows for a segfault is completely different from what it reports in linux).
The custom runner would be the nicest solution, especially if we also gain on the actual output.
Now, to plead the case for that PR:
- we are the end users for the test suite and can make choices that would otherwise be clearly dangerously fragile, so I thought that going with
bashwould not be a problem if it works in the environments where we run the suite, - as we see in our logs, segfaults are logged differently between Windows and Unix (negative exit value instead of actual signal) and, according to a few small tests, this problem actually comes with using
Unix.systemorSys.command; so writing a custom (OCaml) runner will entail merging both behaviours, whatbashis already doing for us.
For the record, I thought using the echo dune stanza could at least provide the feature of having the path of the test being run before it starts but my attempt failed: https://github.com/shym/multicoretests/actions/runs/3591271602/jobs/6045600378#step:10:41
Overall, I can see benefit to the CI log anchors and would like to salvage them!
I'm wondering whether a good solution would be to extend QCheck to output the right format when running under a suitable environment variable, e.g., QCHECK_GA_CI as that could benefit more QCheck CI users and avoid introducing a specialized runner script. :thinking:
What do you think?
How different an output should be printed to achieve that, e.g., for the below? (I see some "::-header" printing in the PR code)
random seed: 337549857
generated error fail pass / total time test name
[✓] 1000 0 0 1000 / 1000 1.2s STM Bytes test sequential
[✓] 11 0 1 10 / 1000 57.4s STM Bytes test parallel
--- Info -----------------------------------------------------------------------
Negative test STM Bytes test parallel failed as expected (21 shrink steps):
|
|
.---------------------.
| |
To_seq (Fill (7, 5, 'K'))
+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Messages for test STM Bytes test parallel:
Results incompatible with linearized model
|
|
.------------------------------------------------------------------------------------------.
| |
To_seq : ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'K'; 'K'; 'K'; 'K'; 'a'; 'a'; 'a'; 'a'] (Fill (7, 5, 'K')) : Ok (())
================================================================================
success (ran 2 tests)
The documentation explains here the syntax to report an error and create the anchor; tldr: echo "::error title=Failure in STM Bytes test parallel::STM Bytes test parallel failed on its 11th run after 57.4 seconds with seed 337549857".
That would nice indeed.
The bash runner provides one thing particularly useful for multicoretests that would be harder to bring into QCheck: it can report when a test crashes (on a signal). That would require wrapping all our tests in some sort of fork_prop (and ensuring we manage to report it as the proper signal even on Windows, something that dune fails at, by the way).
Ah, this old brain is finally starting to understand :sweat_smile: That's why you need a "parent process" to catch crashes and print the anchor (rather than just crash and burn).
For the weird Windows signalling, I think we should report it along with test case if possible to the dune developers.
BTW, thanks for the documentation link - much appreciated! :pray:
Potentially QCheck could ::group ... ::endgroup a list of tests in a test suite in such a mode.