cvise
cvise copied to clipboard
cvise stops intermittently
Occasionally I see
[1]+ Stopped cvise --clang-delta-std=c++20 --print-diff ./check.sh sstable_datafile_test.cc
And I have to restart the job with fg
. This is of course problematic for unattended runs.
The interestingness test runs gdb in batch mode (gdb also runs the program). Perhaps gdb signals interfere with cvise?
A workaround is to send SIGCONT in a loop from some shell script, but I'd like to understand and fix it.
Hmm, I haven't seen this behavior during my cvise
use. So you say that the master process (cvise ...
) got moved to the background and so that it's stopped? All the interestingness tests are run in a separate sub-process (using Pebble
) library and it should not interact with the master process at all.
Can you get a Python back-trace of the master process when it gets moved to the background? Does it really happen only if the int. test uses gdb?
I don't know if it was moved into the background, or if something else happened.
I don't use cvise very often (but when I do, it's for multi-day reductions), so I can't tell if it's related to gdb or not. It seems likely since gdb plays with signals.
I don't know how to generate a Python backtrace (and if it's stopped, I'm sure I won't get once).
I guess a workaround is to package the interestingness test into a container, this should isolate any signals leakage. Still, it would be nice if cvise protected itself from this.
I don't know if it was moved into the background, or if something else happened.
Well the described behavior seems pretty unusual.
I don't use cvise very often (but when I do, it's for multi-day reductions), so I can't tell if it's related to gdb or not. It seems likely since gdb plays with signals.
Anyway, can you please attach a reproduces I can run locally a try to reproduce it?
I guess a workaround is to package the interestingness test into a container, this should isolate any signals leakage. Still, it would be nice if cvise protected itself from this.
Well, that sounds like a solution, but C-Vise should not behaved like you described ;)
https://github.com/avikivity/scylladb/commits/bug-13730-investigation
Steps to reproduce:
- clone the repo into a Fedora 38 installation (or anything with clang 16 + all the dependencies)
- run ./cvise.sh
- wait for long, long hours
There's a container image with all the dependencies: docker.io/scylladb/scylla-toolchain:fedora-38-20230517 However, I did not try reproducing within the container, only on my Fedora 38 host. Note you'll need to run the container as --privileged since ptrace isn't available otherwise.
The problem reproduces rarely. I have a feeling it happens when the pass changes, but it hasn't happened enough times, and usually I wasn't looking when it did.
Thanks for the reproducer. Note I'm changing a job right now and I will get to it in one month from now when I'll have a reasonable powerful machine to reproduce it on. Hope it's fine?
All right, so I've changed a job and got a reasonably fast desktop machine.
Looking at your reproducer: can you please create a container (the provided one docker.io/scylladb/scylla-toolchain:fedora-38-20230517
seems to be unavailable), add there your git branch, and provide me a link, thanks! Note it seems one needs to have built things like /home/avi/scylla/build/release/seastar/libseastar.a
(and probable others) in order to link the sstable_datafile_test_g
binary.