multicoretests icon indicating copy to clipboard operation
multicoretests copied to clipboard

[ocaml5-issue] Assertion failure during parallel `STM Out_channel` or `Sys` tests

Open jmid opened this issue 1 year ago • 1 comments

The merge of #445 to main triggered an assertion failure and abort on Linux trunk during STM Out_channel test parallel: https://github.com/ocaml-multicore/multicoretests/actions/runs/8441854686/job/23121952174

random seed: 115742799
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test sequential (generating)
[✓] 1000    0    0 1000 / 1000     3.7s STM Out_channel test sequential

[02] file runtime/domain.c; line 326 ### Assertion failed: s->running
File "src/io/dune", line 40, characters 7-16:
40 |  (name stm_tests)
            ^^^^^^^^^
(cd _build/default/src/io && ./stm_tests.exe --verbose)
Command got signal ABRT.
[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test parallel

jmid avatar Mar 27 '24 11:03 jmid

Saw this again in focused tests on #304: Linux 5.3.0+trunk debug - this time on STM Sys test parallel https://github.com/ocaml-multicore/multicoretests/actions/runs/9131253128/job/25110039250?pr=304

Starting 6-th run

random seed: 357814880
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential (generating)
[✓] 1000    0    0 1000 / 1000     9.4s STM Sys test sequential

[ ]    0    0    0    0 / 2500     0.0s STM Sys test parallel
[02] file runtime/domain.c; line 325 ### Assertion failed: s->running
/usr/bin/bash: line 1: 1943510 Aborted                 (core dumped) ./focusedtest.exe -v
[ ]  559    0    0  559 / 2500    50.7s STM Sys test parallel

jmid avatar May 17 '24 22:05 jmid

I just observed this locally, on Linux running 5.2.0, trying a run with an extreme space_overhead o=20 and the debug runtime to see if it would reveal anything:

multicoretests$ OCAMLRUNPARAM="s=4096,o=20,v=0,V=1" dune build "@ci" -j1 --no-buffer --display=quiet --cache=disabled --error-reporting=twice --profile=debug-runtime src/
[...]
random seed: 446171203
generated error fail pass / total     time test name
[ ]    1    0    0    1 / 1000     0.5s Lin In_channel test with Domain (shrinking:   11.0003)[01] file runtime/domain.c; line 336 ### Assertion failed: s->running
File "src/io/dune", line 21, characters 7-23:
21 |  (name lin_tests_domain)
            ^^^^^^^^^^^^^^^^
Command got signal ABRT.

jmid avatar Aug 19 '24 13:08 jmid

Closing as this as been fixed upstream and added in #475

jmid avatar Sep 04 '24 14:09 jmid