systemd
systemd copied to clipboard
killall: resume storage daemons
killall causes a traversal of /proc. When the root filesystem is backed by a storage daemon (fuse, NBD), and said storage daemon is suspended (SIGSTOP), a deadlock may occur. Resume storage daemons before invoking killall.
Hmm, why bother? If you suspend your root storage daemons you kinda are asking the system pretty explicitly that you want the system to hang?
I am not convinced we want this. It seems entirely unnecessary? root storage daemons are outside of our management, and I see no case for changing that for a case that doesn't happen unless the user kinda asks for something crazy. The complexity and code this adds appears entirely unwarranted to me.
Hence, can you explain why we should bother with this?
I am running a custom system where the root filesystem is backed by a fuse daemon (https://systemd.io/ROOT_STORAGE_DAEMONS/). During shutdown or suspend the system freezes consistently. To diagnose the issue I added log statements until it was clear that the lock-up occurred while in the killall function. As stated in the log message, since killall traverses the /proc directory it needs the fuse daemon to respond to readdir requests on /. The daemon cannot service these requests if it is suspended.
but why was that fuse daemon suspended? if you susped your fuse daemon things will hang, that's hardly surprising? And maybe the lesson to learn there is not to suspend the fuse daemon if you need it?
but why was that fuse daemon suspended?
kill(-1, SIGSTOP)
in broadcast_signal
does it, right before calling killall
.
hmm, i see, indeed. but it sounds wrong to resume stuff that probably shouldn't have been stopped in the first place.
it appears to me the fix should be to call killall() with SIGSTOP first, instead off kill(-1, SIGSTOP). And similar for resume.
kill(-1, ...)
is nice because it is atomic, I think; I'm concerned using killall to suspend the processes could miss some processes, if some process forks while killall enumerates /proc
.
kill(-1, ...)
is nice because it is atomic, I think
It's not though. The interface has this nice atomic smell to it, but in the kernel it's just a loop that races against everything else the same way as you do it in userspace.
kill(-1, ...)
is nice because it is atomic, I thinkIt's not though. The interface has this nice atomic smell to it, but in the kernel it's just a loop that races against everything else the same way as you do it in userspace.
The linux implementation of kill(-1, signal)
holds tasklist_lock
for reading for the loop duration.
https://github.com/torvalds/linux/blob/v5.17/kernel/signal.c#L1599-L1619
The linux implementation of
kill(-1, signal)
holdstasklist_lock
for reading for the loop duration.https://github.com/torvalds/linux/blob/v5.17/kernel/signal.c#L1599-L1619
interesting...
so, the issue is traversal of the root fs, right?
so what about this: open an fd to /proc/ first, and only use that for traversal. That way we should not need to open / again. Wouldn#t that suffice?
open an fd to /proc/ first, and only use that for traversal. That way we should not need to open / again.
I had the same idea a couple of minutes ago. I will try to find some time to try this this weekend and let you know.
ping?
I have been working on this on and off (more off than on TBF).
I realized along the way that a path other than /proc/*
was accessed (/etc/initrd-release
).
I like the openat
approach to access /proc
but it has led to a larger change: https://github.com/achille-fouilleul/systemd/tree/afoui/killall-open-proc-before-suspending-all-processes
I ran my tests using a casync fuse mount. What I would like to do now is to have the code exercised by the test suite.