nix icon indicating copy to clipboard operation
nix copied to clipboard

Nix 2.21.x -> 2.22.x `download buffer is full` on CentOS 7

Open MatthewCroughan opened this issue 9 months ago • 15 comments

Describe the bug

When using CentOS 7, I ran into an issue with nix versions 2.21.x through to 2.22.x, where builtin fetchers such as those triggered when running nix build nixpkgs#hello fill up a buffer and fail to fetch.

Downgrading to Nix 2.18.1 or 2.18.2 makes this issue stop.

When running nix build nixpkgs#hello, nix will fail to download the GitHub tarball. Running with -vvv shows the following

download thread waiting for 100 ms
download buffer is full; going to sleep

Eventually, the download fails and the following error is emitted

error:
       … while fetching the input 'github:NixOS/nixpkgs/nixpkgs-unstable'

       error: cannot get archive member name: truncated gzip input

Steps To Reproduce

  1. Get CentOS 7
  2. Install Nix via the nixos.org installer
  3. Get nix 2.21.x or 2.22.x
  4. nix build nixpkgs#hello

Expected behavior

For Nix to successfully download and unpack the tarball.

nix-env --version output

$ nix-env --version
nix-env (Nix) 2.21.2

Additional context

I'm sorry I can't provide more helpful steps to reproduce the bug, though I'm happy to help if anyone can instruct me on what to run.

Priorities

Add :+1: to issues you find important.

MatthewCroughan avatar Apr 30 '24 20:04 MatthewCroughan

maybe related to https://github.com/NixOS/nix/issues/10630

MatthewCroughan avatar May 03 '24 19:05 MatthewCroughan

@MatthewCroughan have you found a workaround?

bodokaiser avatar Jul 08 '24 07:07 bodokaiser

@bodokaiser Other than using a Nix release prior to this, no. I wanted to perform a time consuming git bisect, but wouldn't do that unless paid to do so, due to the esoteric nature of the regression effecting ancient Linuxes that I don't have much motivation to touch otherwise.

MatthewCroughan avatar Jul 08 '24 09:07 MatthewCroughan

@MatthewCroughan How did you perform the downgrade? Did you uninstall nix and just installed an older nix, or could you use nix nix-upgrade?

Did you disable SELinux? Are you using the multi-user install? Is there a proxy in your network?

(I also have multiple problems including the cache's SSL due to a proxy - trying to disentangle them)

bodokaiser avatar Jul 08 '24 10:07 bodokaiser

I am having similar issues but on a different setup. I am running docker on a fedora host. Everything is fine on the host but in the docker container (devpod) nix builds are really slow. They dont fail, just take really long. The logs (-vvvvv) show hundreds of lines like this:

download thread waiting for 100 ms
download thread waiting for 100 ms
download thread waiting for 100 ms
download buffer is full; going to sleep
download buffer is full; going to sleep
download thread waiting for 100 ms
download buffer is full; going to sleep

I have not verified that downgrading makes things better but right now I am on 2.23. I am really not sure but maybe the issue is related: #11249

lentilus avatar Aug 07 '24 13:08 lentilus

Have you checked if you have enough space for your cache dir? This was the limitation for me in the endOn 7. Aug 2024, at 15:26, Lentilus @.***> wrote: I am having similar issues but on a different setup. I am running docker on a fedora host. Everything is fine on the host but in the docker container (devpod) nix builds are really slow. They dont fail, just take really long. The logs (-vvvvv) show hundreds of lines like this: download thread waiting for 100 ms download thread waiting for 100 ms download thread waiting for 100 ms download buffer is full; going to sleep download buffer is full; going to sleep download thread waiting for 100 ms download buffer is full; going to sleep

I have not verified that downgrading makes things better but right now I am on 2.23. I am really not sure but maybe the issue is related: #11249

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

bodokaiser avatar Aug 07 '24 13:08 bodokaiser

Can you elaborate on that? Do you mean the store directory or is there an additional cache that I don't know about? On my host I have about 100GiB of free disk. The logs from devpod tell me that the container is started using the following command

19:49:05 debug Running docker command: docker run --sig-proxy=false --mount type=bind,src=/home/lentilus/git/2ndpod,dst=/workspaces/2ndpod -u root -e DEVPOD=true -e REMOTE_CONTAINERS=true -l dev.containers.id=2ndpod-def-e4a1c -l devcontainer.metadata=[{"id":"ghcr.io/devcontainers/features/common-utils:2"},{"id":"ghcr.io/devcontainers/features/git:1"},{"remoteUser":"vscode"},{"entrypoint":"/usr/local/share/nix-entrypoint.sh"},{"onCreateCommand":{"":["sudo chsh -s /usr/bin/zsh $USER"]}}] -l devpod.user=root -d --entrypoint /bin/sh vsc-2ndpod-1e65f:devpod-1a149c9eba5e1e523404f67734dea86e -c echo Container started
trap "exit 0" 15
/usr/local/share/nix-entrypoint.sh
exec "$@"
while sleep 1 & wait $!; do :; done -

I think this means that the container should have access to the full 100GiB, because there is no flag that suggests otherwise... And nixpkgs#hello should not be all that big after all.

lentilus avatar Aug 07 '24 18:08 lentilus

Yes, sure. In addition to /store, nix uses $USER/.cache.In our case, the user directories were on a network drive which limits the total size of a user directory to 10 GB. As you said that you are using nix inside a Docker container, it could very well be that you have some space restrictions.Maybe you can mount $user/.cache to the host systemFor us, we changed the env variable XDG_CACHE_HOME (double check if this is the correct name, I am not at home right now to check the correct name), to something pointing on the local hard drive instead of the users home drive being a network share.On 7. Aug 2024, at 20:26, Lentilus @.***> wrote: Can you elaborate on that? Do you mean the store directory or is there an additional cache that I don't know about? On my host I have about 100GiB of free disk. The logs from devpod tell me that the container is started using the following command 19:49:05 debug Running docker command: docker run --sig-proxy=false --mount type=bind,src=/home/lentilus/git/2ndpod,dst=/workspaces/2ndpod -u root -e DEVPOD=true -e REMOTE_CONTAINERS=true -l dev.containers.id=2ndpod-def-e4a1c -l devcontainer.metadata=[{"id":"ghcr.io/devcontainers/features/common-utils:2"},{"id":"ghcr.io/devcontainers/features/git:1"},{"remoteUser":"vscode"},{"entrypoint":"/usr/local/share/nix-entrypoint.sh"},{"onCreateCommand":{"":["sudo chsh -s /usr/bin/zsh $USER"]}}] -l devpod.user=root -d --entrypoint /bin/sh vsc-2ndpod-1e65f:devpod-1a149c9eba5e1e523404f67734dea86e -c echo Container started trap "exit 0" 15 /usr/local/share/nix-entrypoint.sh exec "$@" while sleep 1 & wait $!; do :; done -

I think this means that the container should have access to the full 100GiB, because there is no flag that suggests otherwise... And nixpkgs#hello should not be all that big after all.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

bodokaiser avatar Aug 07 '24 20:08 bodokaiser

They dont fail, just take really long. The logs (-vvvvv) show hundreds of lines like this:

download thread waiting for 100 ms
download thread waiting for 100 ms
download thread waiting for 100 ms
download buffer is full; going to sleep

@lentilus This sounds like it could be solved by https://github.com/NixOS/nix/pull/11171 ie master or one of the backports.

And nixpkgs#hello should not be all that big after all.

Nixpkgs itself is pretty big though, and it's written to a cache that's implemented by means of a git repo in ~/.cache/nix/tarball-cache/.

roberth avatar Aug 08 '24 13:08 roberth

Thanks @bodokaiser for the details! Unfortunately the size of the cache dir did not seem to be the issue. This may be a stupid question, but is there any way around building nix from source to check if #11171 fixes it? The commit is quite recent and not present in any of the releases, right?

lentilus avatar Aug 11 '24 15:08 lentilus

It's in Nix 2.24, available as nixVersions.latest in the nixos-unstable channel. 2.23-maintenance has not been tagged for this yet.

roberth avatar Aug 11 '24 20:08 roberth

#11171 fixed it. Thank you so much @bodokaiser and @roberth !

lentilus avatar Aug 12 '24 14:08 lentilus

@MatthewCroughan could you also give it a try?

roberth avatar Aug 12 '24 15:08 roberth

@roberth Although downloading seems to be solved, and the log line "download buffer is full" is no longer spammed, another issue has occured. Nix claims the download is finished, and then begins extracting the tarball, but this process hangs indefinitely and does not respond to ^C signals.

[nix-shell:~]$ nix shell github:nixos/nixpkgs#hello -vvvv
evaluating file '<nix/derivation-internal.nix>'
evaluating derivation 'github:nixos/nixpkgs#hello'...
using cache entry 'file:{"name":"source","store":"/nix/store","url":"https://api.github.com/repos/nixos/nixpkgs/commits/HEAD"}' -> '{"etag":"W/\"414dbf039c1e0d4a25053b1d34fdbe18369df6ea0068f219bf6ffbd1c10f25b5\"","storePath":"0h76h2iw2l6y92xzmrrsb5mkvb4z26nc-source","url":"https://api.github.com/repos/nixos/nixpkgs/commits/HEAD"}'
ignoring the client-specified setting 'extra-platforms', because it is a restricted setting and you are not a trusted user
ignoring the client-specified setting 'system-features', because it is a restricted setting and you are not a trusted user
performing daemon worker op: 11
acquiring write lock on '/nix/var/nix/temproots/4762'
performing daemon worker op: 1
using cache entry 'file:{"name":"source","store":"/nix/store","url":"https://api.github.com/repos/nixos/nixpkgs/commits/HEAD"}' -> '{"etag":"W/\"414dbf039c1e0d4a25053b1d34fdbe18369df6ea0068f219bf6ffbd1c10f25b5\"","url":"https://api.github.com/repos/nixos/nixpkgs/commits/HEAD"}', '/nix/store/0h76h2iw2l6y92xzmrrsb5mkvb4z26nc-source'
HEAD revision for 'github:nixos/nixpkgs/HEAD' is 0f1d78c2761069a83de99581ed24533d930f1232
did not find cache entry for 'gitRevToTreeHash:{"rev":"0f1d78c2761069a83de99581ed24533d930f1232"}'
unpacking 'github:nixos/nixpkgs/0f1d78c2761069a83de99581ed24533d930f1232' into the Git cache...
downloading 'https://github.com/nixos/nixpkgs/archive/0f1d78c2761069a83de99581ed24533d930f1232.tar.gz'...
starting download of https://github.com/nixos/nixpkgs/archive/0f1d78c2761069a83de99581ed24533d930f1232.tar.gz
finished download of 'https://github.com/nixos/nixpkgs/archive/0f1d78c2761069a83de99581ed24533d930f1232.tar.gz'; curl status = 0, HTTP status = 200, body = 44453545 bytes, duration = 0.28 s
download thread shutting down

MatthewCroughan avatar Aug 14 '24 22:08 MatthewCroughan

@MatthewCroughan Maybe extraction was just very very slow - could you try #11330? It speeds up extraction significantly on hosts with limited I/O operations per second.

If that doesn't solve the problem, I think we'll need a stack trace from the running process using gdb.

roberth avatar Aug 22 '24 13:08 roberth

@roberth Sadly it doesn't seem to, it would be nice to reproduce this in a VM test or something. The issue is the same as before, unpacking hangs indefinitely and doesn't respond to ^C. I'll ping you about the stack trace.

MatthewCroughan avatar Aug 23 '24 00:08 MatthewCroughan