nix icon indicating copy to clipboard operation
nix copied to clipboard

Memory usage in eval

Open SaltyKitkat opened this issue 2 years ago • 19 comments

Just eval my nixos profile takes about 1G ram. It's kind of too much for me. And when running something like nixpkgs-review, nix will just take more and more and more ram.

Is this by design?

Or is there any way I can reduce the memory usage?

❯ time -v nix eval --raw .#nixosConfigurations.SaltyKitkat.config.system.build.toplevel
/nix/store/v0dh21kn18a74d6gk6ayvcawprcywd65-nixos-system-SaltyKitkat-23.11.20230629.4bc72ca	Command being timed: "nix eval --raw .#nixosConfigurations.SaltyKitkat.config.system.build.toplevel"
	User time (seconds): 5.28
	System time (seconds): 0.75
	Percent of CPU this job got: 77%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.77
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1046296
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 270506
	Voluntary context switches: 43679
	Involuntary context switches: 152
	Swaps: 0
	File system inputs: 123200
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

nix-env run by nixpkgs-review

Command being timed: "nix-env --extra-experimental-features no-url-literals --option system x86_64-linux -f /home/***/.cache/nixpkgs-review/rev-0df1938e62e6084894afab9846e5a842e0091833/nixpkgs -qaP --xml --out-path --show-trace --no-allow-import-from-derivation"
	User time (seconds): 80.84
	System time (seconds): 3.40
	Percent of CPU this job got: 89%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:34.38
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 10705384
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 18322
	Minor (reclaiming a frame) page faults: 3116626
	Voluntary context switches: 1565
	Involuntary context switches: 884
	Swaps: 0
	File system inputs: 41600
	File system outputs: 40
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

SaltyKitkat avatar Jul 01 '23 08:07 SaltyKitkat

We are aware that Nix evaluation tends to consume significant amounts of memory. Causes and potential causes I'm aware of

  • in the evaluator and CLI:
    • closures reference their whole scope, even when parts of it aren't referenced
      • https://github.com/NixOS/nix/issues/8285
    • https://github.com/NixOS/nix/issues/5200
    • possibly our coroutine solution might make the gc overly conservative at times. I don't expect this to be significant, but it has not been quantified.
    • conservative gc without compaction (also not quantified)
  • in the expressions:
    • overridability means that we have to hold on to non-final values, which adds some overhead
    • NixOS' current architecture doesn't scale to a large number of "built in" services; see https://github.com/NixOS/rfcs/pull/22

roberth avatar Jul 01 '23 14:07 roberth

I want to add the boehm garbage collector is a conservative collector that does not allow heap compaction.

I was hoping to spark some interest in assessing the mark-region algorithm as a possible new garbage collection algorithm for nix because it allows for heap compaction. There are some existing implementations in rust (immix) and c (whippet). In particular the whippet implementation seems relevant to nix because it has zero dependencies and has boehm-compatible api.

jsoo1 avatar Jul 01 '23 15:07 jsoo1

@jsoo1 Interesting! Would you be interested in giving whippet a try? I've added notes about gc.

roberth avatar Jul 01 '23 17:07 roberth

Would you be interested in giving whippet a try? I've added notes about gc.

@roberth sweet! Yes I would be interested! I was planning on setting aside some time for it if there seemed to be interest from the team.

jsoo1 avatar Jul 01 '23 17:07 jsoo1

Let's move the discussion of replacing the GC over to https://github.com/NixOS/nix/issues/8626

roberth avatar Jul 01 '23 17:07 roberth

Thanks for the summary!

Since there's already memory leaks, I'm wondering if the gc is working as expected and maybe just improve the gc makes no sence if the most memory usage is by the leaked memory.

SaltyKitkat avatar Jul 05 '23 11:07 SaltyKitkat

I don't expect the GC itself to be broken, and I don't expect many leaks from it being conservative either. It manages to collect an amount about equal to the final heap size in a typical evaluation by ofborg (ie half of allocations are collected). It is hard to know how much it should be able to collect though. So that makes your question a good one, which could perhaps be answered with a combination of profiling and debugging, although we might need custom tooling to really start relating expressions to the heap and gc.

roberth avatar Jul 05 '23 13:07 roberth

I ran into this while upgrading from NixOS 23.05 to 23.11 on my cloud VM with 2G of RAM. nix-build itself took 1G of that, and also there were some server services running, taking up about 500M, leaving only 500M for the actual derivation builds. Naturally it OOM'd kind of a lot.

I worked around that by taking the derivation file paths from the these NNN derivations will be built: output, pasting that into a file and running xargs -n1 nix-build < derivations.txt. Not sure if the -n1 also helped, but it feels like some gains could be had here by separating the two phases. I will happily be corrected if I'm working off incorrect assumptions, but it appears to me that the memory usage of nix-build is all related to Nix expressions, which at this point in the build process are entirely unneeded, since all the required information exists in the .drv files. Maybe the Nix expression evaluation could happen in a separate process that then terminates before nix-build moves on to building the derivations, or the Nix expressions could be allocated in an arena that is freed all at once after evaluation is done, or something like that?

That would not solve the original problem, and looking into a different GC still sounds valuable, but it might make the problem less acute for a portion of affected users.

majewsky avatar Dec 09 '23 11:12 majewsky

Regarding freeing the expressions, a starting point would be https://github.com/NixOS/nix/pull/5747#issuecomment-1615939700, but also making sure to destruct EvalState and the expression cache.

If you have really small machines to deploy to, you might want to use nixos-rebuild --target-host. That will neither build nor evaluate on the target machine.

roberth avatar Dec 09 '23 14:12 roberth

nixos-rebuild --target-host is a good hint and I will take that under consideration. But for what it's worth, that does not solve OOM during auto-upgrades as triggered by system.autoUpgrade.enable = true; as far as I can see.

majewsky avatar Dec 12 '23 17:12 majewsky

CC @astro FYI

While learning nix and nix flakes, this command freezed my dear and at that point mostly idle 16GB laptop, eating >10GB:

nix flake show microvm

shortened output:

github:astro/microvm.nix/7bd9255e535c8cbada7f574ddd3bcf3bfa5e1eae                                                                                                             
├───apps                                                                                                                                                                      
│   ├───aarch64-linux                                                                                                                                                         
│   │   ├───graphics: app                                                                                                                                                     
│   │   ├───qemu-vnc: app                                                                                                                                                     
│   │   ├───vm: app                                                                                                                                                           
│   │   └───waypipe-client: app                                                                                                                                               
│   └───x86_64-linux                                                                                                                                                          
│       ├───graphics: app                                                              
│       ├───qemu-vnc: app                                                                                                                                                     
│       ├───vm: app                                                                                                                                                           
│       └───waypipe-client: app                                                                                                                                               
├───defaultTemplate: template: Flake with MicroVMs                                                                                                                            
├───hydraJobs                                                                                                                                                                 
│   ├───aarch64-linux                                                                  
│   │   ├───cloud-hypervisor-overlay-shutdown-command: derivation 'microvm-test-shutdown-command'
[...SNIP...]
│   │   └───vm-stratovirt-iperf: derivation 'vm-stratovirt-iperf'                                                                                                             
error: interrupted by the user                                                                                                                                                
nix flake show microvm  58,38s user 4,46s system 92% cpu 1:07,85 total

The output is actually from a run after I found https://github.com/rfjakob/earlyoom - You might want to recommend this nice tool somewhere!

Please don't get this issue site tracked by me. I just thought it might be interesting to mention earlyoom in this issue and have an example on how to reliably eat a lot of memory.

thkoch2001 avatar Dec 21 '23 08:12 thkoch2001

https://github.com/NixOS/rfcs/pull/163 may reduce memory use for NixOS, by virtue of not having to load service modules that aren't used.

It's one solution among potentially others, such as #9650 for cases like show microvm.

roberth avatar Dec 21 '23 15:12 roberth

Any memory usage improvements are very welcome. My CI runner with 16 GB RAM now also occasionally triggers the OOM killer when evaluating my NixOS configurations.

blitz avatar Jul 01 '24 15:07 blitz

I seem to be be encountering this too. A nix flake show in the microvm repo consumed a whopping ~24G of RAM.

ciacon avatar Jul 19 '24 11:07 ciacon

I am encountering this as well, Nixpkgs-review when evalling sometimes fills up my whole RAM (16Gigs), before the usage is like 5Gigs, smh.

JohnRTitor avatar Sep 16 '24 10:09 JohnRTitor

https://github.com/NixOS/nix/pull/13407 brings significant improvements in terms of memory usage (~20% less maximum heap size). Combined with other optimizations the upcoming nix 2.30 release should consume roughly ~25% less heap size compared to 2.29. I'm afraid that this is the best we can do without a drastic redesign of the evaluator. That PR already uses quite complex bit-packing tricks to pack data more tightly in the current data model and I can't think of another such optimization.

xokdvium avatar Jul 02 '25 22:07 xokdvium

Slightly more memory usage improvements in https://github.com/NixOS/nix/pull/13919.

xokdvium avatar Sep 06 '25 08:09 xokdvium

and I can't think of another such optimization.

Ok, seems like I was very much wrong and I can think of another such optimization. https://github.com/NixOS/nix/pull/13987 shaves off 37% heap size for nixpkgs eval CI for x86_64-linux system. This is 6.5GB less memory.

xokdvium avatar Sep 15 '25 00:09 xokdvium

I'm working on this also in #14088

Radvendii avatar Nov 24 '25 14:11 Radvendii