nimbus-eth2 icon indicating copy to clipboard operation
nimbus-eth2 copied to clipboard

Eliminate sporadic quit() calls.

Open cheatfate opened this issue 10 months ago • 5 comments

This PR converts all the established quit calls into raiseDefect statements and also establishes place where all this specific Defects being handled and appropriate exit code returned.

The only quit() left is Doppelganger one, i make it available for 2 reasons.

  1. Is it ok to convert it to Defect?
  2. 129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129.

cheatfate avatar Jan 28 '25 22:01 cheatfate

Unit Test Results

       15 files  ±0    2 285 suites  ±0   1h 14m 50s :stopwatch: + 8m 31s   5 368 tests ±0    5 021 :heavy_check_mark: ±0  347 :zzz: ±0  0 :x: ±0  37 069 runs  ±0  36 579 :heavy_check_mark: ±0  490 :zzz: ±0  0 :x: ±0 

Results for commit 4b8d93d4. ± Comparison against base commit 67ac6fcb.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jan 28 '25 23:01 github-actions[bot]

2. 129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129

I don't remember exactly but wasn't that number chosen (arbitrary?) so that for example a service could be set up not to restart the process when the program terminated with that specific exit code?

kdeme avatar Jan 29 '25 07:01 kdeme

129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129.

I don't remember the reasoning either, but it would break a lot of installations to change it now, because it's also part of the systemd service file that we recommend. If SIGHUP can really cause this, we should probably install a handler to ignore it.

arnetheduck avatar Jan 29 '25 08:01 arnetheduck

129 is invalid exit code number because it falls into the signals range. For Linux SIGHUP signal sent to process could also return exit code 129.

I don't remember the reasoning either, but it would break a lot of installations to change it now, because it's also part of the systemd service file that we recommend. If SIGHUP can really cause this, we should probably install a handler to ignore it.

It addresses https://github.com/status-im/nimbus-eth2/issues/3973 where previously it had been 1031, which was not a valid choice, and it doesn't conflict with https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Process%20Exit%20Codes.

https://nimbus.guide/doppelganger-detection.html documents this:

If any activity is detected, the node shuts down with exit code 129.

tersec avatar Jan 29 '25 09:01 tersec

It doesn't matter where we used 129 - we should change this number to any number which is < 128.

cheatfate avatar Jan 29 '25 11:01 cheatfate