nix icon indicating copy to clipboard operation
nix copied to clipboard

libstore: Add load-limit setting to dynamically control parallelism

Open centromere opened this issue 2 years ago • 4 comments

On busy machines where Nix co-exists with other workloads, parallelism may not work as intended. For example, consider a 64 core machine whose load average is 24 and where Nix is limited to 8 cores. By default, -j8 -l8 will be passed to GNU Make. Since the load average exceeds 8, no parallelism will take place despite the fact that 8 cores are available. In this case, load-limit should be set to 0 to prevent the -lN flag from being passed to GNU Make.

See also: https://github.com/NixOS/nixpkgs/pull/174473

centromere avatar Aug 02 '22 20:08 centromere

The downside of this approach is that every concurrent Nix derivation will use up to 8 parallel jobs, without regard to load.

Maybe the best solution is to make the GNU jobserver available in the sandbox...

edolstra avatar Aug 03 '22 13:08 edolstra

Shouldn't nix-daemon take care of counting the leftover cores? I expect that when I do nix build --cores 8 I'm telling a build that it's allowed to occupy up to 8 cores (regardless if it's running GNU Make or a Tensorflow job).

veprbl avatar Aug 03 '22 16:08 veprbl

Removing the load average should only be done if a system wide job server limits the jobs. A proof of conecpt was implemented in https://github.com/NixOS/nixpkgs/pull/143820 .

The current setting (jobs equal to load) leads to unused CPU load, when:

  1. The system has a lot of IO tasks (e.g. busy hard drives).
  2. The system runs CPU intensive processes.

I think this PR can be seen as simple solution for more control about the utilization of the system.

ck3d avatar Aug 03 '22 18:08 ck3d

The PoC in https://github.com/NixOS/nixpkgs/pull/143820 sets both -j and -l to ${NIX_BUILD_CORES}. Even with a jobserver running, will Make properly ask for tokens if the load average exceeds NIX_BUILD_CORES?

centromere avatar Aug 03 '22 19:08 centromere

The downside of this approach is that every concurrent Nix derivation will use up to 8 parallel jobs, without regard to load.

If the default is cores == load-limit nothing would change. However on machines with mixed work loads it is highly desirable to remove this limitation https://github.com/NixOS/nixpkgs/pull/174473.

markuskowa avatar Aug 16 '22 20:08 markuskowa

we're open to moving the jobserver prototype we made in https://github.com/NixOS/nixpkgs/pull/143820 into nix itself. unfortunately the gnu jobserver protocol isn't universally supported, ghc for example uses semaphores instead of pipes. unfortunately we can't support both the gnu protocol and sysv-semaphore-like protocols with the same implementation without kernel support (ie, fuse or other drivers).

pennae avatar Aug 19 '22 14:08 pennae

I'm not a cgroups expert by any means and I haven't tried too hard, but I couldn't find a satisfying tool in there. Well, limiting RAM could be one way, as I think the RAM exhaustion is the main risk of too aggressive parallelism here. As for CPUs... cgroups offer limiting to just a particular subset of system's CPUs (not count but particular subset, sadly); such model seemed hard to apply well here, but I suspect the detections like nproc would then report and use the size of this subset at least.

vcunat avatar Apr 28 '23 16:04 vcunat

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-04-28-nix-team-meeting-minutes-50/27698/1

nixos-discourse avatar Apr 28 '23 17:04 nixos-discourse