Why does `nix-daemon.service` set `KillMode=process`
Describe the bug
The nix-daemon.service.in file (which is used in NixOS to generate /etc/systemd/system/nix-daemon.service) sets KillMode=process.
Quoting from the systemd.kill man page (emphasis mine):
KillMode=Specifies how processes of this unit shall be killed. One ofcontrol-group,mixed,process,none.If set to
control-group, all remaining processes in the control group of this unit will be killed on unit stop (for services: after the stop command is executed, as configured withExecStop=). If set tomixed, theSIGTERMsignal (see below) is sent to the main process while the subsequentSIGKILLsignal (see below) is sent to all remaining processes of the unit's control group. If set toprocess, only the main process itself is killed (not recommended!). If set tonone, no process is killed (strongly recommended against!). In this case, only the stop command will be executed on unit stop, but no process will be killed otherwise. Processes remaining alive after stop are left in their control group and the control group continues to exist after stop unless empty.Note that it is not recommended to set
KillMode=toprocessor evennone, as this allows processes to escape the service manager's lifecycle and resource management, and to remain running even while their service is considered stopped and is assumed to not consume any resources....
Defaults to
control-group.
Is there a good reason, why KillMode=process is used here? I think that using control-group or mixed seems more appropriate for nix-daemon.
I suspect, that this misconfiguration might be the cause of some issues regarding nix-daemon not behaving well under OOM pressure on NixOS. Also, see systemd/systemd#32687.
Priorities
Add :+1: to issues you find important.
nix-daemon uses a forking model where the lifetime of the connection is coupled with long-running operations, both from the client as well as server's perspective of "operations"
- A build can take a long time. By killing existing connection workers, all current builds will be terminated; usually for no good reason.
- A client may remain connected for a long time, and is not equipped to recover from a connection failure (in fact it will refuse to connect ever again, which seems a little excessive, but might prevent accidental fork bombs perhaps; not sure). Examples
- Cachix watch-store daemon
- Hercules CI Agent main process
- Hydra
- Any other processes that connect for a long time, and/or perform many operations
So KillMode=process is intentional, and while it's not the only possible design for a system Nix service, changing it would require a significant architectural change.
I don't quite understand your explanation. KillMode is supposed to control what happens when somebody (or something) requests for the nix-daemon service to be stopped. For example, this can happen if the user runs
sudo systemctl stop nix-daemon.service
or if the system is currently running out of memory and the user has configured systemd-oomd in such a way that it decided that nix-daemon.service should be killed to relieve memory pressure.
It seems to me that in these circumstances, terminating all current builds is exactly what should happen. nix-daemon runs its local build processes as direct children, in the same slice and (by default) in the same cgroup.
I don't think that it makes sense for systemctl stop nix-daemon.service to keep the builds going as orphan processes. Are the builders even capable of storing their results to /nix/store after they finish if no nix-daemon is running at that point?
Isn't the stop behavior also used when updating the unit, at least on NixOS? In those cases it's usually not required to stop everything; that would be disruptive, as I described, although it'd be wise to gracefully restart any clients. It's certainly not without flaws.
Are the builders even capable of storing their results to
/nix/storeafter they finish if nonix-daemonis running at that point?
I haven't encountered any problems that would suggest that this breaks. As far as I can tell, the kernel and/or systemd don't release resources until the last worker is gone.
Some small changes we could make to improve matters:
- [ ] Reopen daemon connections instead of outright refusing after the first disconnect or failure to connect
- [ ] Add channel multiplexing to the daemon protocol so we can send heartbeat messages and time out stalled connections and clients, as well as signal to the client that the connection should be reestablished because of an update
- [ ] Make the newly started nix-daemon terminate "foreign" processes in its cgroup after some timeout?
Related
- https://github.com/NixOS/nix/issues/8939
Isn't the stop behavior also used when updating the unit, at least on NixOS?
Hm, I was under the impression that there was some way to support reloading services without killing them by using ExecReload and SIGHUP, but that might only be relevant when the configuration of the program itself (ie nix.conf) is changed (systemctl reload), not when the service itself (ie nix-daemon.service) is changed (systemctl restart/daemon-reload).
Killing all builds due to a restart might indeed be undesirable, I haven't thought of that. I wonder if there is some way to distinguish between these two cases.
Nix could also do something systemd does to update the PID 1: it could serialize its state and re-execute itself (or let the service manager do that). For that, the socket will need to be kept alive by systemd and passed via a file descriptor. But I don't think this works with long-running connections.
We can't serialize the worker/socket bidirectional pipes, but the protocol could be updated to allow the daemon worker to initiate a reset. Clients that support this can then transparently reopen the connection, so that they get a freshly configured worker for their next operations.
For those connections to complete their current operation without error, it requires that pre-existing worker processes in the slice are not terminated on restart. This can be configured in systemd.
It does have the side effect that defunct process of some kind can stick around indefinitely, but this is actually a different problem that should be solved separately and not rely on regular systemctl restarts.