systemd icon indicating copy to clipboard operation
systemd copied to clipboard

killall: resume storage daemons

Open achille-fouilleul opened this issue 2 years ago • 14 comments

killall causes a traversal of /proc. When the root filesystem is backed by a storage daemon (fuse, NBD), and said storage daemon is suspended (SIGSTOP), a deadlock may occur. Resume storage daemons before invoking killall.

achille-fouilleul avatar Mar 27 '22 09:03 achille-fouilleul

Hmm, why bother? If you suspend your root storage daemons you kinda are asking the system pretty explicitly that you want the system to hang?

poettering avatar Mar 28 '22 12:03 poettering

I am not convinced we want this. It seems entirely unnecessary? root storage daemons are outside of our management, and I see no case for changing that for a case that doesn't happen unless the user kinda asks for something crazy. The complexity and code this adds appears entirely unwarranted to me.

Hence, can you explain why we should bother with this?

poettering avatar Mar 28 '22 12:03 poettering

I am running a custom system where the root filesystem is backed by a fuse daemon (https://systemd.io/ROOT_STORAGE_DAEMONS/). During shutdown or suspend the system freezes consistently. To diagnose the issue I added log statements until it was clear that the lock-up occurred while in the killall function. As stated in the log message, since killall traverses the /proc directory it needs the fuse daemon to respond to readdir requests on /. The daemon cannot service these requests if it is suspended.

achille-fouilleul avatar Mar 28 '22 13:03 achille-fouilleul

but why was that fuse daemon suspended? if you susped your fuse daemon things will hang, that's hardly surprising? And maybe the lesson to learn there is not to suspend the fuse daemon if you need it?

poettering avatar Mar 28 '22 13:03 poettering

but why was that fuse daemon suspended?

kill(-1, SIGSTOP) in broadcast_signal does it, right before calling killall.

achille-fouilleul avatar Mar 28 '22 13:03 achille-fouilleul

hmm, i see, indeed. but it sounds wrong to resume stuff that probably shouldn't have been stopped in the first place.

it appears to me the fix should be to call killall() with SIGSTOP first, instead off kill(-1, SIGSTOP). And similar for resume.

poettering avatar Mar 28 '22 13:03 poettering

kill(-1, ...) is nice because it is atomic, I think; I'm concerned using killall to suspend the processes could miss some processes, if some process forks while killall enumerates /proc.

achille-fouilleul avatar Mar 28 '22 14:03 achille-fouilleul

kill(-1, ...) is nice because it is atomic, I think

It's not though. The interface has this nice atomic smell to it, but in the kernel it's just a loop that races against everything else the same way as you do it in userspace.

poettering avatar Mar 28 '22 14:03 poettering

kill(-1, ...) is nice because it is atomic, I think

It's not though. The interface has this nice atomic smell to it, but in the kernel it's just a loop that races against everything else the same way as you do it in userspace.

The linux implementation of kill(-1, signal) holds tasklist_lock for reading for the loop duration.

https://github.com/torvalds/linux/blob/v5.17/kernel/signal.c#L1599-L1619

vcaputo avatar Mar 28 '22 16:03 vcaputo

The linux implementation of kill(-1, signal) holds tasklist_lock for reading for the loop duration.

https://github.com/torvalds/linux/blob/v5.17/kernel/signal.c#L1599-L1619

interesting...

poettering avatar Apr 01 '22 15:04 poettering

so, the issue is traversal of the root fs, right?

so what about this: open an fd to /proc/ first, and only use that for traversal. That way we should not need to open / again. Wouldn#t that suffice?

poettering avatar Apr 01 '22 15:04 poettering

open an fd to /proc/ first, and only use that for traversal. That way we should not need to open / again.

I had the same idea a couple of minutes ago. I will try to find some time to try this this weekend and let you know.

achille-fouilleul avatar Apr 01 '22 15:04 achille-fouilleul

ping?

poettering avatar Aug 03 '22 09:08 poettering

I have been working on this on and off (more off than on TBF). I realized along the way that a path other than /proc/* was accessed (/etc/initrd-release). I like the openat approach to access /proc but it has led to a larger change: https://github.com/achille-fouilleul/systemd/tree/afoui/killall-open-proc-before-suspending-all-processes I ran my tests using a casync fuse mount. What I would like to do now is to have the code exercised by the test suite.

achille-fouilleul avatar Aug 03 '22 19:08 achille-fouilleul