piscsi
piscsi copied to clipboard
SHUT_DOWN command does not allow python interrupt handler to take effect
The SHUT_DOWN command in rascsi somehow takes effect faster than doing 'shutdown -h now' or 'reboot' on the shell. The rascsi OLED monitor script has an interrupt handler for graceful termination of the script, but it does not have time to fully execute when issuing a SHUT_DOWN command through rascsi.
This could very well be a bug with the interrupt handler I'm using in the python script, but raising this ticket for further investigation.
@rdmark When you run "init 0" instead of "reboot" or "shutdown -h now", does your monitor script have the same issue? In general, there are no guaranteed minimum shutdown times. I don't think it is legal to rely on any special timing when a system is shut down. User processes are stopped in random order, as far as I know. But maybe the reboot() system call is not equivalent to "shutdown" on the command line. I will try to find this out.
@rdmark Please try whether feature branch feature_shutdown_improvement resolves your issue. Instead of using the reboot() system call I use the init command instead.
@uweseimet Thanks for looking into this. The use init in the code has improved the situation a bit I think, but not to the point where it works fully the same as commands from the shell. A few observations from running a few tests today:
From the shell, "init 0", "reboot" and "shutdown -h now" all have the same effect, which is that my interrupt handler takes effect as expected. Testing with some debug output, what I think is happening is that the systemd configuration that I have defines three ExecStop steps. When the system is shut down from the shell with either the above three commands, all three ExecStop steps are executed. But when it is shut down from RaSCSI code, only the first step is carried out.
Now, when using the reboot() system call, if I read systemlog correctly, every single systemd graceful shutdown handling is ignored. Normally you should see a bunch of log messages where nginx, avahi, etc. etc. are being stopped. When using the init command, these shutdown messages are now shown. So I think the new code is an improvement. This is a sample:
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped target Multi-User System.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped target Login Prompts.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped target Timers.
Dec 19 17:44:02 rascsi3b systemd[1]: apt-daily-upgrade.timer: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped Daily apt upgrade and clean activities.
Dec 19 17:44:02 rascsi3b systemd[1]: apt-daily.timer: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped Daily apt download activities.
Dec 19 17:44:02 rascsi3b systemd[1]: e2scrub_all.timer: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped Periodic ext4 Online Metadata Check for All Filesystems.
Dec 19 17:44:02 rascsi3b systemd[1]: logrotate.timer: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped Daily rotation of log files.
Dec 19 17:44:02 rascsi3b systemd[1]: man-db.timer: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped Daily man-db regeneration.
Dec 19 17:44:02 rascsi3b systemd[1]: systemd-tmpfiles-clean.timer: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped target System Time Synchronized.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopped target System Time Set.
Dec 19 17:44:02 rascsi3b systemd[1]: systemd-rfkill.socket: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Dec 19 17:44:02 rascsi3b systemd[1]: Unmounting RPC Pipe File System...
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping Avahi mDNS/DNS-SD Stack...
Dec 19 17:44:02 rascsi3b avahi-daemon[319]: Got SIGTERM, quitting.
Dec 19 17:44:02 rascsi3b avahi-daemon[319]: Leaving mDNS multicast group on interface rascsi_bridge.IPv6 w
ith address fe80::d4ec:25ff:fe1d:275d.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping Regular background program processing daemon...
Dec 19 17:44:02 rascsi3b avahi-daemon[319]: Leaving mDNS multicast group on interface enxb827ebb5a315.IPv6
with address 2601:602:8d00:f8a0:ebb4:38cb:8ae5:d6f8.
Dec 19 17:44:02 rascsi3b avahi-daemon[319]: Leaving mDNS multicast group on interface enxb827ebb5a315.IPv4
with address 10.0.0.241.
Dec 19 17:44:02 rascsi3b avahi-daemon[319]: Leaving mDNS multicast group on interface lo.IPv6 with address
::1.
Dec 19 17:44:02 rascsi3b avahi-daemon[319]: Leaving mDNS multicast group on interface lo.IPv4 with address
127.0.0.1.
Dec 19 17:44:02 rascsi3b avahi-daemon[319]: avahi-daemon 0.8 exiting.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping dphys-swapfile - set up, mount/unmount, and delete a swap fi
le...
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping Getty on tty1...
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping Macproxy service...
Dec 19 17:44:02 rascsi3b systemd[7907]: run-rpc_pipefs.mount: Succeeded.
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping Monitor RaSCSI service...
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping A high performance web server and a reverse proxy server...
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping RaSCSI-Web service...
Dec 19 17:44:02 rascsi3b systemd[1]: Stopping LSB: Switch to ondemand cpu governor (unless shift key is pr
essed)...
However, for monitor_rascsi, the issue remains that the latter ExecStop commands are being ignored, or don't have enough time to execute.
@rdmark The code in this branch calls "init 0" or "init 6", resp., with a system() call. This is IMO equivalent to executing "init 0" or "init 6" from the shell. I suggest you have a look at the C++ code (just a few lines, just search for "init 6" to find them). You can replace "init 6" with a different command in rascsi.cpp, e.g. "reboot", in order to see whether this makes a difference.
@uweseimet I've been modifying the code in rascsi.cpp and trying different system calls for a bit, but no luck so far. Discussion in this stackoverflow thread seems to suggest that regular system calls won't shut down systemd services correctly, and that you should use DBus instead. I'll keep researching!
We spent quite an effort troubleshooting this already, and I don't think it's worth spending more time on such a trivial issue. The real solution might be to write an entirely new python event handler class that is more capable.