nix
nix copied to clipboard
libstore: Add load-limit setting to dynamically control parallelism
On busy machines where Nix co-exists with other workloads, parallelism may not work as intended. For example, consider a 64 core machine whose load average is 24 and where Nix is limited to 8 cores. By default, -j8 -l8
will be passed to GNU Make. Since the load average exceeds 8, no parallelism will take place despite the fact that 8 cores are available. In this case, load-limit
should be set to 0
to prevent the -lN
flag from being passed to GNU Make.
See also: https://github.com/NixOS/nixpkgs/pull/174473
The downside of this approach is that every concurrent Nix derivation will use up to 8 parallel jobs, without regard to load.
Maybe the best solution is to make the GNU jobserver available in the sandbox...
Shouldn't nix-daemon take care of counting the leftover cores? I expect that when I do nix build --cores 8
I'm telling a build that it's allowed to occupy up to 8 cores (regardless if it's running GNU Make or a Tensorflow job).
Removing the load average should only be done if a system wide job server limits the jobs. A proof of conecpt was implemented in https://github.com/NixOS/nixpkgs/pull/143820 .
The current setting (jobs equal to load) leads to unused CPU load, when:
- The system has a lot of IO tasks (e.g. busy hard drives).
- The system runs CPU intensive processes.
I think this PR can be seen as simple solution for more control about the utilization of the system.
The PoC in https://github.com/NixOS/nixpkgs/pull/143820 sets both -j
and -l
to ${NIX_BUILD_CORES}
. Even with a jobserver running, will Make properly ask for tokens if the load average exceeds NIX_BUILD_CORES
?
The downside of this approach is that every concurrent Nix derivation will use up to 8 parallel jobs, without regard to load.
If the default is cores == load-limit
nothing would change. However on machines with mixed work loads it is highly desirable to remove this limitation https://github.com/NixOS/nixpkgs/pull/174473.
we're open to moving the jobserver prototype we made in https://github.com/NixOS/nixpkgs/pull/143820 into nix itself. unfortunately the gnu jobserver protocol isn't universally supported, ghc for example uses semaphores instead of pipes. unfortunately we can't support both the gnu protocol and sysv-semaphore-like protocols with the same implementation without kernel support (ie, fuse or other drivers).
I'm not a cgroups expert by any means and I haven't tried too hard, but I couldn't find a satisfying tool in there. Well, limiting RAM could be one way, as I think the RAM exhaustion is the main risk of too aggressive parallelism here. As for CPUs... cgroups offer limiting to just a particular subset of system's CPUs (not count but particular subset, sadly); such model seemed hard to apply well here, but I suspect the detections like nproc would then report and use the size of this subset at least.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-04-28-nix-team-meeting-minutes-50/27698/1