seastar icon indicating copy to clipboard operation
seastar copied to clipboard

stall-detector: Try hard not to crash while collecting backtrace

Open xemul opened this issue 1 year ago • 1 comments

Sometimes stall-detector signal comes in the middle of exception handling. If the stall is detected, stack unwiding starts to collect the stalled backtrace. Since exception handling means unwiding the stack as well, those two unwinders need to cooperate carefully, which is not guaranteed (spoiler: they don't cooperate carefully). In unlucky case, segmentation fault happens, the app is killed with SEGV.

This patch helps stall detector to bail out in case of SEGV arrival while collecting the backtrace with minimally possible yet detailed enough stall report.

xemul avatar Sep 04 '24 11:09 xemul

Doesn't solve the problem entirely, since SIGSEGV isn't the only possible symptom (you could get an infinite loop for example, why not), but I guess it prevents a crash in the cases it's enough (which is probably a great majority of cases), and doesn't hurt in the others, so why not.

michoecho avatar Sep 04 '24 11:09 michoecho

It's worth noting there is a reproducer now, for at least one type of crash, see https://github.com/scylladb/seastar/issues/2697 and https://github.com/scylladb/seastar/pull/2714. So perhaps it is worth revisiting this PR as part of the reason it has stalled seemed to be the lack of repro?

travisdowns avatar Apr 08 '25 14:04 travisdowns

upd: rebased to check #2714

xemul avatar Apr 25 '25 08:04 xemul

closing in favor of #2714

xemul avatar May 26 '25 14:05 xemul