vespa
vespa copied to clipboard
Improve debugging experience by having Vespa C++ processes dump stack trace on crash
If a Vespa C++ process terminates unexpectedly (i.e. from raising SIGSEGV
, SIGABRT
etc.) and core dumps are disabled on the host, very little information is left for debugging the root cause. Additionally, visibility of the crash itself is limited—it has to be inferred from inspecting Cluster Controller events (nodes going down in the cluster state), local config sentinel events or the host's /var/log/messages
(and/or system journal).
The most fundamental piece of information we want to have for any crash (including explicitly triggered assertion failures) is the symbolized stack trace at the crash site.
Printing this information requires installing signal handlers for the relevant signals and tracing/symbolizing the stack frames. Some likely requirements:
- Stack frame tracing and symbolizing should be async signal safe.
- To be able to deal with a
SIGSEGV
caused by exhausting stack space (and subsequently hitting a guard page) we'd need an alternate signal handler stack. - The signal handler should keep track of any previously installed handler and call it after it's done. This chaining is necessary to properly defer to handlers implicitly installed by AddressSanitizer et al.
Doing this the Right Way(tm) is a bit of work, so if we end up with a transitive dependency on Abseil once we upgrade our C++ protobuf library dependencies, we might want to consider using its failure handler utility, which satisfies the above requirements.
// InstallFailureSignalHandler()
//
// Installs a signal handler for the common failure signals `SIGSEGV`, `SIGILL`,
// `SIGFPE`, `SIGABRT`, `SIGTERM`, `SIGBUG`, and `SIGTRAP` (provided they exist
// on the given platform). The failure signal handler dumps program failure data
// useful for debugging in an unspecified format to stderr. This data may
// include the program counter, a stacktrace, and register information on some
// systems; do not rely on an exact format for the output, as it is subject to
// change.
void InstallFailureSignalHandler(const FailureSignalHandlerOptions& options);