iceoryx icon indicating copy to clipboard operation
iceoryx copied to clipboard

Race condition between ProbeProcessAliveWithSigTerm in RouDi and SignalWatcher singleton in applications

Open gpalmer-latai opened this issue 8 months ago • 5 comments

Required information

When shutting down applications, RouDi periodically sends SIGTERM signals. Usually these are all eaten by the SignalWatcher in applications which simply sets a flag that a SIGTERM was received.

But there is a very small window of time where right after the destructor for the SignalWatcher has run and reset the signal handler, but before the process is actually dead, that the SIGTERM triggers the default signal handler which terminates the program with a non-zero exit code.

This is pretty annoying when monitoring applications for non-zero exit codes because every once in awhile you'll get the SIGTERM exit code (usually 128 + 15 = 143 though in my environment it seems to be just 15).

Operating system: NVidia DriveOS 6.0.6

Compiler version: Can't recall off the top of my head - some version of gcc for arm64 CPU

Eclipse iceoryx version: Fork based off of 8a5e08348a8fe830cdb1d92bf5b299a3e4bc8282 (Upstream mean as of 2024.11.1)

Observed result or behaviour: Occasionally applications have exit code 15, but all signs from RouDi and the application are that it exited gracefully

Expected result or behaviour: Exit code 0 always. Preferably we monitor for processes being alive or not some other way besides spamming SIGTERM. I believe the idea of a file lock was mentioned w.r.t. Iceoryx2?

Conditions where it occurred / Performed steps: Have an application which connects to RouDi and uses the waitForTermination() signal handler method. Take a non-trivial amount of time to shutdown such that RouDi sends several followup SIGTERM's after the first. Be unlucky and have the signal watcher partially destroyed when receiving one of those SIGTERMS

Additional helpful information

gpalmer-latai avatar Jun 27 '24 19:06 gpalmer-latai