multicoretests icon indicating copy to clipboard operation
multicoretests copied to clipboard

Add information to logs with a custom runner

Open shym opened this issue 2 years ago • 8 comments

Add a bash script to use as runner for our test suite

This script adds information to the logs:

  • explicit names of the test run when they start
  • some explicit messages for crashes (SIGSEGV, SIGBUS, ...)
  • and, maybe most importantly, anchors in CI logs, so that the main webpage contains the most important information and direct links to precise points of interest in the log

It is written as a bash script but making sure it can be used as a runner also on Windows CI (notably by commenting all ends of lines, without which it fails with errors about '\r's that don't exist...)

Example of result of the changes: the webpage for this run shows directly that we got a SEGV in Ephemeron on trunk (windows). Or this run with SIGABRT in buffers on trunk on macos.

shym avatar Nov 21 '22 21:11 shym

I'm a bit hesitant about this one. As someone who has scanned CI logs for the past two weeks I certainly follow you on the benefits!

On the negative side,

  • this adds back in a custom runner now that we finally got rid of the clunky check_error_count
  • it is in bash - which is untyped, has lots of quirks (witness the newline issue), and has caused problems for people before (I think there was even an incident at JaneStreet)

I'm wondering if this is something

  • that could/should be handled in dune or in QCheck_base_runner instead
  • or something we could handle with a custom runner?

I've been thinking of the latter to avoid printing counterexamples twice (with and without return values)...

jmid avatar Nov 25 '22 10:11 jmid

I do agree it’d be nicer to go without bash. I had tried first to see if it would be possible to do in dune, but I couldn’t figure it out (see the fact that the error code shown in windows for a segfault is completely different from what it reports in linux). The custom runner would be the nicest solution, especially if we also gain on the actual output.

shym avatar Nov 28 '22 17:11 shym

Now, to plead the case for that PR:

  • we are the end users for the test suite and can make choices that would otherwise be clearly dangerously fragile, so I thought that going with bash would not be a problem if it works in the environments where we run the suite,
  • as we see in our logs, segfaults are logged differently between Windows and Unix (negative exit value instead of actual signal) and, according to a few small tests, this problem actually comes with using Unix.system or Sys.command; so writing a custom (OCaml) runner will entail merging both behaviours, what bash is already doing for us.

shym avatar Nov 29 '22 16:11 shym

For the record, I thought using the echo dune stanza could at least provide the feature of having the path of the test being run before it starts but my attempt failed: https://github.com/shym/multicoretests/actions/runs/3591271602/jobs/6045600378#step:10:41

shym avatar Dec 01 '22 11:12 shym

Overall, I can see benefit to the CI log anchors and would like to salvage them!

I'm wondering whether a good solution would be to extend QCheck to output the right format when running under a suitable environment variable, e.g., QCHECK_GA_CI as that could benefit more QCheck CI users and avoid introducing a specialized runner script. :thinking:

What do you think?

How different an output should be printed to achieve that, e.g., for the below? (I see some "::-header" printing in the PR code)

random seed: 337549857
generated error fail pass / total     time test name
[✓] 1000    0    0 1000 / 1000     1.2s STM Bytes test sequential
[✓]   11    0    1   10 / 1000    57.4s STM Bytes test parallel

--- Info -----------------------------------------------------------------------

Negative test STM Bytes test parallel failed as expected (21 shrink steps):

                             |           
                             |           
                  .---------------------.
                  |                     |           
                To_seq          (Fill (7, 5, 'K'))  


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test STM Bytes test parallel:

  Results incompatible with linearized model

                                                                                               |                                            
                                                                                               |                                            
                                                 .------------------------------------------------------------------------------------------.
                                                 |                                                                                          |                                            
     To_seq : ['a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'a'; 'K'; 'K'; 'K'; 'K'; 'a'; 'a'; 'a'; 'a']                                (Fill (7, 5, 'K')) : Ok (())                               

================================================================================
success (ran 2 tests)

jmid avatar Jan 31 '23 11:01 jmid

The documentation explains here the syntax to report an error and create the anchor; tldr: echo "::error title=Failure in STM Bytes test parallel::STM Bytes test parallel failed on its 11th run after 57.4 seconds with seed 337549857". That would nice indeed. The bash runner provides one thing particularly useful for multicoretests that would be harder to bring into QCheck: it can report when a test crashes (on a signal). That would require wrapping all our tests in some sort of fork_prop (and ensuring we manage to report it as the proper signal even on Windows, something that dune fails at, by the way).

shym avatar Jan 31 '23 13:01 shym

Ah, this old brain is finally starting to understand :sweat_smile: That's why you need a "parent process" to catch crashes and print the anchor (rather than just crash and burn).

For the weird Windows signalling, I think we should report it along with test case if possible to the dune developers.

jmid avatar Jan 31 '23 14:01 jmid

BTW, thanks for the documentation link - much appreciated! :pray: Potentially QCheck could ::group ... ::endgroup a list of tests in a test suite in such a mode.

jmid avatar Jan 31 '23 14:01 jmid