scylla-machine-image Delay coredump.service shutdown after scylla.service shutdown

Issue description

[ ] This issue is a regression.
[ ] It is unknown if this issue is a regression.

Recently in a test when executing a soft reboot node, scylla had an error 'aborting on shard'. It didn't create coredump due

!ERR | systemd-coredump[8354]: Failed to connect to coredump service: Connection refused

It looks like the reboot triggers shutdown of coredump service before waiting for scylla service to stop - can we make it to stop after scylla is down?

Impact

No coredump - harder issues investigation

Installation details

Kernel Version: 5.15.0-1028-aws Scylla version (or git commit hash): 5.2.0~rc1-20230207.8ff4717fd010 with build-id 78fbb2c25e9244a62f57988313388a0260084528

Cluster size: 3 nodes (i4i.large)

Scylla Nodes used in this run:

longevity-5gb-1h-SoftRebootNodeMonk-db-node-249f30ed-3 (3.252.166.103 | 10.4.3.176) (shards: 2)
longevity-5gb-1h-SoftRebootNodeMonk-db-node-249f30ed-2 (34.245.61.44 | 10.4.1.56) (shards: 2)
longevity-5gb-1h-SoftRebootNodeMonk-db-node-249f30ed-1 (34.240.130.155 | 10.4.1.211) (shards: 2)

OS / Image: ami-05e1d6aa4f71f3f25 (aws: eu-west-1)

Test: longevity-5gb-1h-SoftRebootNodeMonkey-aws-test Test id: 249f30ed-7007-4b8e-a320-1207ebca5e5d Test name: scylla-5.2/nemesis/longevity-5gb-1h-SoftRebootNodeMonkey-aws-test Test config file(s):

longevity-5gb-1h-SoftRebootNodeMonkey.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 249f30ed-7007-4b8e-a320-1207ebca5e5d
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 249f30ed-7007-4b8e-a320-1207ebca5e5d

Logs:

db-cluster-249f30ed.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/249f30ed-7007-4b8e-a320-1207ebca5e5d/20230214_215652/db-cluster-249f30ed.tar.gz
sct-runner-249f30ed.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/249f30ed-7007-4b8e-a320-1207ebca5e5d/20230214_215652/sct-runner-249f30ed.tar.gz
monitor-set-249f30ed.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/249f30ed-7007-4b8e-a320-1207ebca5e5d/20230214_215652/monitor-set-249f30ed.tar.gz
loader-set-249f30ed.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/249f30ed-7007-4b8e-a320-1207ebca5e5d/20230214_215652/loader-set-249f30ed.tar.gz
parallel-timelines-report-249f30ed.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/249f30ed-7007-4b8e-a320-1207ebca5e5d/20230214_215652/parallel-timelines-report-249f30ed.tar.gz

Jenkins job URL

Feb 15 '23 09:02 soyacz

Is this different than https://github.com/scylladb/scylla-enterprise/issues/2648 which is solved via https://github.com/scylladb/scylladb/pull/12757 ?

Feb 15 '23 14:02 mykaul

Is this different than https://github.com/scylladb/scylla-enterprise/issues/2648 which is solved via https://github.com/scylladb/scylladb/pull/12757 ?

No, this case is a coredump happening during shutdown of a node after systemd-coredump.socket is closed

Luckily for us this coredump happened in more cases.

Feb 15 '23 14:02 fruch

I finally able to reproduce this after patching scylla to delay shutdown and cause SIGSEGV:

diff --git a/main.cc b/main.cc
index 35c4c25caa..1d167e3152 100644
--- a/main.cc
+++ b/main.cc
@@ -103,6 +103,8 @@
 
 #include <boost/algorithm/string/join.hpp>
 
+#include <signal.h>
+
 namespace fs = std::filesystem;
 
 seastar::metrics::metric_groups app_metrics;
@@ -471,6 +473,8 @@ static auto defer_verbose_shutdown(const char* what, Func&& func) {
         startlog.info("Shutting down {}", what);
         try {
             func();
+            seastar::sleep(std::chrono::minutes(1)).get();
+            raise(SIGSEGV);
             startlog.info("Shutting down {} was successful", what);
         } catch (...) {
             auto ex = std::current_exception();

So I tried to delay coredump.service shutdown after scylla-server.service, by following drop-in conf:

$ cat /etc/systemd/system/scylla-server.service.d/dependencies.conf 
[Unit]
After=local-fs.target network-online.target systemd-coredump.socket var-lib-systemd-coredump.mount
Requires=local-fs.target network-online.target systemd-coredump.socket var-lib-systemd-coredump.mount

Also, I found that there are GH issue which says [email protected] may get terminate when shutdown (https://github.com/systemd/systemd/issues/7176), so I also added a workaround for this:

$ cat /etc/systemd/system/[email protected]/killsignal.conf 
[Service]
KillSignal=SIGCONT

I thought now we can capture coredump correctly on systemd-coredump, but it's not. It still cause error on systemd-coredump, even systemd-coredump.socket is shutdown after scylla-server.service:

Jul 01 00:27:41 ubuntu-jammy scylla[1054]: Segmentation fault on shard 0.
Jul 01 00:27:41 ubuntu-jammy scylla[1054]: Backtrace:
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   0x56b8e88
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   0x56ed416
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   /opt/scylladb/libreloc/libc.so.6+0x3cb1f
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   /opt/scylladb/libreloc/libc.so.6+0x8ce5b
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   /opt/scylladb/libreloc/libc.so.6+0x3ca75
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   0x13c1bb0
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   0x13c21b3
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   0x127f256
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   0x1272375
Jul 01 00:27:41 ubuntu-jammy scylla[1054]:   0x59440f6
Jul 01 00:27:41 ubuntu-jammy systemd-coredump[2271]: Failed to send coredump fd: Broken pipe
Jul 01 00:28:07 ubuntu-jammy systemd[1]: scylla-server.service: Main process exited, code=dumped, status=11/SEGV
Jul 01 00:28:07 ubuntu-jammy systemd[1]: scylla-server.service: Failed with result 'core-dump'.
Jul 01 00:28:07 ubuntu-jammy systemd[1]: Stopped Scylla Server.
Jul 01 00:28:07 ubuntu-jammy systemd[1]: scylla-server.service: Consumed 14min 45.846s CPU time.
...
Jul 01 00:28:07 ubuntu-jammy systemd[1]: systemd-coredump.socket: Deactivated successfully.
Jul 01 00:28:07 ubuntu-jammy systemd[1]: Closed Process Core Dump Socket.

I tried again and again with bit different configuration, but systemd-coredump never worked during shutdown. So I decided only possible workaround is stop using systemd-coredump handler on kernel.core_pattern and set filepath on kernel.core_pattern directly. I will send a workaround for that.

Jul 04 '23 19:07 syuu1228

@avikivity Do you have any idea with this issue?

Jul 08 '23 14:07 syuu1228

Opened issue on systemd GH https://github.com/systemd/systemd/issues/28338

Jul 10 '23 11:07 syuu1228

@avikivity ping, do you have any idea?

Jul 20 '23 14:07 syuu1228

Sorry for missing the issue. I often skip over scylla-machine-image because I don't maintain it. I'll look over it now.

Oct 24 '23 16:10 avikivity

I guess the problem is that, even with the dependency, systemd thinks the process is done (not sure why - the PID still exists while dumping code) so it stops systemd-coredumpd while the code dump is in progress. Very funky.

Oct 24 '23 16:10 avikivity

@avikivity since we decided to not merging workaround, what else can we do for this? Maybe we should document it?

Oct 27 '23 23:10 syuu1228

scylla-machine-image scylla-machine-image copied to clipboard

Delay coredump.service shutdown after scylla.service shutdown

Issue description

Impact

Installation details

Logs:

scylla-machine-image
scylla-machine-image copied to clipboard