nix icon indicating copy to clipboard operation
nix copied to clipboard

nixos-rebuild silently crashes out of memory without updating

Open obadz opened this issue 6 years ago • 17 comments

I hesitated to file this one on nixpkgs.

On a VM with 512Mb of ram, I've seen the following happen:

$ nixos-rebuild boot --upgrade
unpacking channels...
created 2 symlinks in user environment
building Nix...
building the system configuration...
these derivations will be built:
  /nix/store/xqh93bd85ks37l9b30rwa3d4p99qqa2a-system-path.drv
  /nix/store/46lgkbf29k9wj7bcrxxfq1sk7pqhqpq3-unit-polkit.service.drv
[…]
these paths will be fetched (55.46 MiB download, 63.54 MiB unpacked):
  /nix/store/0ksg3q70h5n4x1v70gvghz45xax6w52n-nixos-version
  /nix/store/7gbd1as2whimg6a0d6rfdis5c9syxsl2-linux-4.14.97
[…]
copying path '/nix/store/0ksg3q70h5n4x1v70gvghz45xax6w52n-nixos-version' from 'https://cache.nixos.org'...
copying path '/nix/store/gywc473i8ahighmsj9s6kfik5by9x69a-kernel-modules' from 'https://cache.nixos.org'...
building '/nix/store/5k76blki8zxjcbr3qp9h5jfcin21h918-etc-nixos.conf.drv'...
building '/nix/store/mr6731c960n1j6vj99slmrhqjpy7bafw-etc-os-release.drv'...
[…]
collision between `/nix/store/kkdknnfhqmkb8p9pmmww4xivz6aa9w9f-inetutils-1.9.4/bin/hostname' and `/nix/store/00bgd045z0d4icpbc2yyz4gx48ak44la-net-tools-1.60_p20170221182432/bin/hostname'
collision between `/nix/store/kkdknnfhqmkb8p9pmmww4xivz6aa9w9f-inetutils-1.9.4/bin/dnsdomainname' and `/nix/store/00bgd045z0d4icpbc2yyz4gx48ak44la-net-tools-1.60_p20170221182432/bin/dnsdomainname'
created 7798 symlinks in user environment

If you don't look closely, everything appears to be fine. But…

$ echo $?
137

A bit of investigation reveals the OOM killer kicked in:

Jan 06 16:53:39 hostname kernel: update-mime-dat invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
Jan 06 16:53:39 hostname kernel: update-mime-dat cpuset=/ mems_allowed=0
Jan 06 16:53:39 hostname kernel: CPU: 0 PID: 15023 Comm: update-mime-dat Not tainted 4.14.86 #1-NixOS
Jan 06 16:53:39 hostname kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
Jan 06 16:53:39 hostname kernel: Call Trace:
Jan 06 16:53:39 hostname kernel:  dump_stack+0x5c/0x85
Jan 06 16:53:39 hostname kernel: [10007]     0 10007    32862      145      10       3        0             0 nixos-rebuild
Jan 06 16:53:39 hostname kernel: [10008]     0 10008   110701    84161     216       3        0             0 nix-build
Jan 06 16:53:39 hostname kernel: [10010]     0 10010    65529     5743      64       3        0             0 nix-daemon
Jan 06 16:53:39 hostname kernel: [14702] 30001 14702     4146      311      11       2        0             0 bash
Jan 06 16:53:39 hostname kernel: [15023] 30001 15023    17141    11334      37       2        0             0 update-mime-dat
Jan 06 16:53:39 hostname kernel: Out of memory: Kill process 10008 (nix-build) score 653 or sacrifice child
Jan 06 16:53:39 hostname kernel: Killed process 10008 (nix-build) total-vm:442804kB, anon-rss:336644kB, file-rss:0kB, shmem-rss:0kB
Jan 06 16:53:39 hostname kernel: oom_reaper: reaped process 10008 (nix-build), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

It seems that more often than not, this command https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/config/xdg/mime.nix#L27 gets the memory usage over the line and causes nix-build to be killed.

Ain't nothing wrong with that but my main source of grief is that the output looks so benign that I assume the command succeeded. Not sure why I don't see an OOM error message.

Nixpkgs rev: 135a7f9604c Nix version 2.1.3

obadz avatar Feb 03 '19 00:02 obadz

I usually stop several services before doing the rebuild (nixos-rebuild switch).

The problem is that sometimes even that doesn't help (e.g. when I add a new package that needs building and it consumes a lot of memory) and then those services are still down (e.g. httpd, MySQL, ..)

karantan avatar Jul 29 '20 08:07 karantan

and what's the solution here? unable to complete rebuild on a system with 3gb available ram. all services stopped, etc. always oom-kill

Which brings another question - is nix unusable on systems with less than 8gb ram? and exactly how do we rebuild/switch those systems?

voobscout avatar Dec 15 '20 12:12 voobscout

and what's the solution here? unable to complete rebuild on a system with 3gb available ram. all services stopped, etc. always oom-kill

Which brings another question - is nix unusable on systems with less than 8gb ram? and exactly how do we rebuild/switch those systems?

For now, can you not make a swapfile of a few GB? My Pinebook Pro has 4GB of RAM, and adding a 4GB swapfile allows building the kernel.

lordcirth avatar Dec 15 '20 14:12 lordcirth

@lordcirth unfortunately swap isn't an option here, low-spec HDD which is already busy. Swap solves it on some systems, although it's awkward to manually upgrade, but others simply can't use swap.

voobscout avatar Dec 15 '20 15:12 voobscout

It's possible to reduce memory consumption a lot by only including the modules that you use in the baseModules: https://github.com/nixos/nixpkgs/blob/a3f0ef0a1fe3bc4a0d9eb176fcac246634d413c2/nixos/lib/eval-config.nix#L16-L17

Unfortunately nixos-rebuild doesn't support passing that parameter during evaluation. What's possible though is to use the new --flake flag, which allows to precisely control the NixOS evaluation.

The flake.nix would look something like that:

{
  outputs = { self, nixpkgs }: {
    nixosConfigurations.myhost = import "${nixpkgs}/nixos/lib/eval-config.nix" {
       baseModules = [
         # import all the modules here
       ];
       modules = [ (import ./myhost/configuration.nix) ];
     };
  };
}

And then to build, you would use nixos-rebuild --flake .#myhost build.

This should work in theory. It's probably going to take a while to populate the baseModules with minimal requirements.

zimbatm avatar Dec 15 '20 16:12 zimbatm

It's possible to reduce memory consumption a lot by only including the modules that you use in the baseModules: https://github.com/nixos/nixpkgs/blob/a3f0ef0a1fe3bc4a0d9eb176fcac246634d413c2/nixos/lib/eval-config.nix#L16-L17

Unfortunately nixos-rebuild doesn't support passing that parameter during evaluation. What's possible though is to use the new --flake flag, which allows to precisely control the NixOS evaluation.

The flake.nix would look something like that:

{
  outputs = { self, nixpkgs }: {
    nixosConfigurations.myhost = import "${nixpkgs}/nixos/lib/eval-config.nix" {
       baseModules = [
         # import all the modules here
       ];
       modules = [ (import ./myhost/configuration.nix) ];
     };
  };
}

And then to build, you would use nixos-rebuild --flake .#myhost build.

This should work in theory. It's probably going to take a while to populate the baseModules with minimal requirements.

So instead of using lib.nixosSystem, which has all the modules in nixpkgs loaded(?), this creates the same structure, but with an empty list of baseModules for you to populate? Would you just build this repeatedly and add every module who's absence breaks the build?

lordcirth avatar Dec 15 '20 16:12 lordcirth

You got it. It will be quite painful to build the full list as module inter-dependencies are not being tracked, but that's the best (only?) way to reduce memory usage. As NixOS gets more and more services defined, the memory usage keeps growing.

zimbatm avatar Dec 15 '20 17:12 zimbatm

I marked this as stale due to inactivity. → More info

stale[bot] avatar Jun 14 '21 21:06 stale[bot]

Depends on your nix config of course but mine failed at 1gb ram (and with an additional 1G swap). Swap definitely helped though after i changed it to 4G.

asdf8dfafjk avatar Aug 27 '21 08:08 asdf8dfafjk

On that note, curious and want to do a mini survey of sorts-- My system has 8GB of RAM and the nixos-rebuild-switch step takes about 30 minutes before printing the list of stuff to build and stuff to download. How does it go with you guys? Please note your RAM and time before the list of things to download prints. Additionally if you can also print the susttained, for me its about 2.5/4 CPUs used and 91-94% RAM used, that would also be nice.

asdf8dfafjk avatar Aug 28 '21 10:08 asdf8dfafjk

I marked this as stale due to inactivity. → More info

stale[bot] avatar Apr 16 '22 11:04 stale[bot]

I was just bit by this on a 1GB VPS service. The solution was to add 2GB of swap. That's the easy part, the hard part was trying to figure out why nixos-rebuild wasn't starting my services (answer: it was silently being killed by the OOM killer).

dunnl avatar Sep 26 '22 02:09 dunnl

Faced the same issue on a raspberry pi with 1GB RAM when trying to rebuild using a flake-based configuration; adding 2GB swap fixed this for me too.

pweth avatar Sep 20 '23 18:09 pweth

Just ran into this when building a DigitalOcean droplet, any chance we can get some kind of warning that this is happening? The silent failure is the real pain here.

SamuelKurtzer avatar Apr 08 '24 12:04 SamuelKurtzer

Getting this same issue when using WSL, had to upgrade the memory used by WSL to 3GB. The part of it silently failing is very annoying,

Doomwhite avatar Jul 23 '24 17:07 Doomwhite

I have also encountered the same issue when trying to provision a VM on Proxmox. Started with 512MB of memory, 1GB and then 3GB.

I wanted to run NixOS on a small vm that would only run Nginx. But the requirement that the VM needs at least 3GB of ram to do an update is not the best.

mentos1386 avatar Sep 01 '24 09:09 mentos1386

I can agree. Using nixos-rebuild switch on a 1GB RAM VPS is impossible without using at least 2GB of swap. The whole system will just freeze. NixOS doesn't seem to be an option for people that want declarative config on a low end VPS.

Fijxu avatar Oct 18 '24 01:10 Fijxu

5 years later with a config large enough, --upgrade managed to throw me into the same problem even on 16 GB RAM, had to do the painful waiting twice before i realized i need to close the browser and let it cook

walywest avatar Feb 17 '25 23:02 walywest

Ran into this issue after an hour installing and configuring a 1GB VPS. Couldn't figure out why rebuild was not doing anything until I noticed the 127 exit code and the OOM killer logs. Increasing my swapfile from 512mb to 3gb fixed the issue--though I could probably get away with less, since I noticed only a peak of ~800mb swap usage during rebuild.

I would love to see some fix that doesn't involve increasing swap though.

ItzDerock avatar Feb 27 '25 03:02 ItzDerock