seastar icon indicating copy to clipboard operation
seastar copied to clipboard

sa_sigaction handler installed by install_oneshot_signal_handler deadlocks if reentered

Open bhalevy opened this issue 2 years ago • 2 comments

Seen in https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1405/artifact/testlog/x86_64/release/database_test.test_truncate_without_snapshot_during_writes.114.log

Reactor stalled for 5210231 ms on shard 0. Backtrace: 0x3b708e2 0x3b6f540 0x3b707f0 0x7f24b7830a1f 0x3b9fd2d 0x36a78b1 0x7f24b7830a1f 0x10497 0x7f24b6d031a3

Decoded:

seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1366
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1108
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1125
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1349
?? ??:0
seastar::internal::cpu_relax() at ./build/release/seastar/./seastar/include/seastar/util/spinlock.hh:43
 (inlined by) seastar::util::spinlock::lock() at ./build/release/seastar/./seastar/include/seastar/util/spinlock.hh:94
 (inlined by) lock_guard at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/std_mutex.h:229
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:3656
 (inlined by) __invoke at ./build/release/seastar/./seastar/src/core/reactor.cc:3655
wasmtime_runtime::traphandlers::unix::trap_handler at wasmtime_runtime.9xv9ixev-cgu.15:?

I was able to reproduce this with the faulty scylla unit test but due to compiler optimizations I could get down to the bottom of it using gdb of the hung process.

However, it is evident that the segfault signal handler doesn't return, so the spinlock is never unlocked. A segfault or abort coming from the signal handler itself or happen in parallel will spin forever when trying to acquire the spinlock in https://github.com/scylladb/seastar/blob/1d4432ed281ef3c610c4ef17968f15358a1d6755/src/core/reactor.cc#L3656

bhalevy avatar Jul 19 '22 11:07 bhalevy

One example for apparent nested Segfault is:

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f10cabff640 (LWP 2656476))]
#0  0x00007f10cd36d498 in uw_frame_state_for () from /lib64/libgcc_s.so.1
(gdb) bt
#0  0x00007f10cd36d498 in uw_frame_state_for () from /lib64/libgcc_s.so.1
#1  0x00007f10cd36f1a4 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x00007f10cd29b3c6 in backtrace () from /lib64/libc.so.6
#3  0x0000000003b722a9 in seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) (func=...) at ./seastar/include/seastar/util/backtrace.hh:59
#4  seastar::backtrace_buffer::append_backtrace (this=0x6010001333f8) at ./seastar/src/core/reactor.cc:758
#5  seastar::print_with_backtrace (buf=..., oneline=false) at ./seastar/src/core/reactor.cc:788
#6  0x0000000003ba16ad in seastar::print_with_backtrace (cause=<optimized out>, oneline=false) at ./seastar/src/core/reactor.cc:800
#7  seastar::sigsegv_action () at ./seastar/src/core/reactor.cc:3675
#8  seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const (this=<optimized out>, sig=<optimized out>, info=<optimized out>, p=<optimized out>) at ./seastar/src/core/reactor.cc:3656
#9  seastar::install_oneshot_signal_handler<11, (void (*)())(&seastar::sigsegv_action)>()::{lambda(int, siginfo_t*, void*)#1}::__invoke(int, siginfo_t*, void*) (sig=11, info=<optimized out>, p=0x13a4a8000) at ./seastar/src/core/reactor.cc:3654
#10 0x00000000036a8b72 in wasmtime_runtime::traphandlers::unix::trap_handler ()
#11 <signal handler called>
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

bhalevy avatar Jul 19 '22 13:07 bhalevy

This is not a seastar issue, but rather a scylla issue due to its use of wasmtime which installs signal handlers.

avikivity avatar Jul 19 '22 13:07 avikivity