nix Why does `nix-daemon.service` set `KillMode=process`

Describe the bug

The nix-daemon.service.in file (which is used in NixOS to generate /etc/systemd/system/nix-daemon.service) sets KillMode=process.

Quoting from the systemd.kill man page (emphasis mine):

KillMode= Specifies how processes of this unit shall be killed. One of control-group, mixed, process, none.

If set to control-group, all remaining processes in the control group of this unit will be killed on unit stop (for services: after the stop command is executed, as configured with ExecStop=). If set to mixed, the SIGTERM signal (see below) is sent to the main process while the subsequent SIGKILL signal (see below) is sent to all remaining processes of the unit's control group. If set to process, only the main process itself is killed (not recommended!). If set to none, no process is killed (strongly recommended against!). In this case, only the stop command will be executed on unit stop, but no process will be killed otherwise. Processes remaining alive after stop are left in their control group and the control group continues to exist after stop unless empty.

Note that it is not recommended to set KillMode= to process or even none, as this allows processes to escape the service manager's lifecycle and resource management, and to remain running even while their service is considered stopped and is assumed to not consume any resources.

...

Defaults to control-group.

Is there a good reason, why KillMode=process is used here? I think that using control-group or mixed seems more appropriate for nix-daemon.

I suspect, that this misconfiguration might be the cause of some issues regarding nix-daemon not behaving well under OOM pressure on NixOS. Also, see systemd/systemd#32687.

Priorities

Add :+1: to issues you find important.

Jun 26 '24 02:06 ruro

nix-daemon uses a forking model where the lifetime of the connection is coupled with long-running operations, both from the client as well as server's perspective of "operations"

A build can take a long time. By killing existing connection workers, all current builds will be terminated; usually for no good reason.
A client may remain connected for a long time, and is not equipped to recover from a connection failure (in fact it will refuse to connect ever again, which seems a little excessive, but might prevent accidental fork bombs perhaps; not sure). Examples
- Cachix watch-store daemon
- Hercules CI Agent main process
- Hydra
- Any other processes that connect for a long time, and/or perform many operations

So KillMode=process is intentional, and while it's not the only possible design for a system Nix service, changing it would require a significant architectural change.

Jun 26 '24 03:06 roberth

I don't quite understand your explanation. KillMode is supposed to control what happens when somebody (or something) requests for the nix-daemon service to be stopped. For example, this can happen if the user runs

sudo systemctl stop nix-daemon.service

or if the system is currently running out of memory and the user has configured systemd-oomd in such a way that it decided that nix-daemon.service should be killed to relieve memory pressure.

It seems to me that in these circumstances, terminating all current builds is exactly what should happen. nix-daemon runs its local build processes as direct children, in the same slice and (by default) in the same cgroup.

I don't think that it makes sense for systemctl stop nix-daemon.service to keep the builds going as orphan processes. Are the builders even capable of storing their results to /nix/store after they finish if no nix-daemon is running at that point?

Jun 26 '24 09:06 ruro

Isn't the stop behavior also used when updating the unit, at least on NixOS? In those cases it's usually not required to stop everything; that would be disruptive, as I described, although it'd be wise to gracefully restart any clients. It's certainly not without flaws.

Are the builders even capable of storing their results to /nix/store after they finish if no nix-daemon is running at that point?

I haven't encountered any problems that would suggest that this breaks. As far as I can tell, the kernel and/or systemd don't release resources until the last worker is gone.

Some small changes we could make to improve matters:

[ ] Reopen daemon connections instead of outright refusing after the first disconnect or failure to connect
[ ] Add channel multiplexing to the daemon protocol so we can send heartbeat messages and time out stalled connections and clients, as well as signal to the client that the connection should be reestablished because of an update
[ ] Make the newly started nix-daemon terminate "foreign" processes in its cgroup after some timeout?

Jun 26 '24 09:06 roberth

https://github.com/NixOS/nix/issues/8939

Jun 26 '24 09:06 roberth

Isn't the stop behavior also used when updating the unit, at least on NixOS?

Hm, I was under the impression that there was some way to support reloading services without killing them by using ExecReload and SIGHUP, but that might only be relevant when the configuration of the program itself (ie nix.conf) is changed (systemctl reload), not when the service itself (ie nix-daemon.service) is changed (systemctl restart/daemon-reload).

Killing all builds due to a restart might indeed be undesirable, I haven't thought of that. I wonder if there is some way to distinguish between these two cases.

Jun 26 '24 10:06 ruro

Nix could also do something systemd does to update the PID 1: it could serialize its state and re-execute itself (or let the service manager do that). For that, the socket will need to be kept alive by systemd and passed via a file descriptor. But I don't think this works with long-running connections.

Aug 06 '24 15:08 vikanezrimaya

We can't serialize the worker/socket bidirectional pipes, but the protocol could be updated to allow the daemon worker to initiate a reset. Clients that support this can then transparently reopen the connection, so that they get a freshly configured worker for their next operations.

For those connections to complete their current operation without error, it requires that pre-existing worker processes in the slice are not terminated on restart. This can be configured in systemd. It does have the side effect that defunct process of some kind can stick around indefinitely, but this is actually a different problem that should be solved separately and not rely on regular systemctl restarts.

Aug 06 '24 18:08 roberth