`PackageTests/NewBuild/CmdRun/Terminate` brittle, may fail on CI
The test PackageTests/NewBuild/CmdRun/Terminate has been caught to fail on CI. Maybe it has races?
There was a previous attempt to fix this issue:
- #8080
Ping @robx .
https://github.com/haskell/cabal/runs/7945613288?check_suite_focus=true#step:17:35
...
threads: 2
tests to run: 427
PackageTests/NewBuild/CmdRun/Terminate/cabal.test.hs FAIL (12.33s)
...
Building executable 'exe' for RunKill-1.0..
...
Linking /home/runner/work/cabal/cabal/cabal-testsuite/PackageTests/NewBuild/CmdRun/Terminate/dist-newstyle/build/x86_64-linux/ghc-8.8.4/RunKill-1.0/x/exe/build/exe/exe ...
+ /home/runner/work/cabal/cabal/dist-newstyle-validate-ghc-8.8.4/build/x86_64-linux/ghc-8.8.4/cabal-install-3.9.0.0/x/cabal/build/cabal/cabal v2-run exe
...
about to sleep
done sleeping
exiting
cabal: waitForProcess: does not exist (No child processes)
terminated
stderr:
*** Exception: unexpected: done sleeping
CallStack (from HasCallStack):
assertOutputDoesNotContain, called at PackageTests/NewBuild/CmdRun/Terminate/cabal.test.hs:35:3 in main:Main
*** unexpected failure for PackageTests/NewBuild/CmdRun/Terminate/cabal.test.hs
Judging from the ticket and the error line (34 vs. 35), the two runs are slightly different:
Up to date
terminated
vs.
about to sleep
done sleeping
exiting
cabal: waitForProcess: does not exist (No child processes)
terminated
Thanks for catching this!
Judging from the ticket and the error line (34 vs. 35), the two runs are slightly different:
The test has changed in the meantime, so a changed output is to be expected. I do think the fix was good, and this is separate issue.
It would be good to know how often this happens, has it been observed more often than this once? It might be something as annoying as the write to disk taking more than a couple of seconds, or the controlling process randomly being paused for multiple seconds due to OS scheduling.
I think the best next step to narrow this down would be to instrument the test a bit with timestamped logging, unless this can be reproduced somewhat reliably locally.
@andreasabel Do you have any info on how frequent this is?
@andreasabel Do you have any info on how frequent this is?
I just caught it and reported it, don't know how frequently it occurs.
Ah good then, considering the "high priority" label I was worried that things might be on fire.
I added "priority: high" because CI shouldn't fail (as this will stop the whole merge process...).
I haven't seen this test fail lately. Is there a way to grep CI for this failure?
I think this may no longer be relevant.
Seen it 2024-04-12 in
- #9886
Hit it in #9949: https://github.com/haskell/cabal/actions/runs/8877985809/job/24372711684?pr=9949