nix-gl-host icon indicating copy to clipboard operation
nix-gl-host copied to clipboard

"No space left on device" but I have plenty of space left on device

Open samuela opened this issue 1 year ago • 5 comments

I was getting this weird error:

[nix-shell:~/nixpkgs]$ nixglhost ipython
Traceback (most recent call last):
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 681, in <module>
    ret = main(args)
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 629, in main
    new_env = nvidia_main(cache_dir, host_dsos_paths, args.print_ld_library_path)
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 564, in nvidia_main
    cache_paths.append(cache_library_path(p, tmp_cache_dir, cache_dir))
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 449, in cache_library_path
    copy_and_patch_libs(dsos=dsos, dest_dir=d, rpath=rpath_lib_dir)
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 331, in copy_and_patch_libs
    shutil.copyfile(dso.fullpath, newpath)
  File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 267, in copyfile
    _fastcopy_sendfile(fsrc, fdst)
  File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 156, in _fastcopy_sendfile
    raise err from None
  File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 142, in _fastcopy_sendfile
    sent = os.sendfile(outfd, infd, offset, blocksize)
OSError: [Errno 28] No space left on device: '/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.545.23.08' -> '/run/user/1000/tmpzgwq37ed/nix-gl-host/3076b0246cb1468199a8444860ebaebe5ec5081f85098057a9f4d5b40c3de738/lib/libnvidia-eglcore.so.545.23.08'

[nix-shell:~/nixpkgs]$ df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         31G   25G  6.8G  79% /
devtmpfs         7.7G     0  7.7G   0% /dev
tmpfs            7.7G     0  7.7G   0% /dev/shm
tmpfs            3.1G  924K  3.1G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/loop0        25M   25M     0 100% /snap/amazon-ssm-agent/7628
/dev/loop1        56M   56M     0 100% /snap/core18/2812
/dev/loop2        64M   64M     0 100% /snap/core20/2105
/dev/loop3        87M   87M     0 100% /snap/lxd/26881
/dev/loop4        87M   87M     0 100% /snap/lxd/26975
/dev/loop5        41M   41M     0 100% /snap/snapd/20671
/dev/nvme0n1p15  105M  6.1M   99M   6% /boot/efi
tmpfs            1.6G   20K  1.6G   1% /run/user/1000

Oddly, it seems to have gone away now, despite not having changed my configuration. Not sure what might be going on, or how to reproduce consistently, but hoping that creating this issue will help to spur discussion.

samuela avatar Feb 09 '24 04:02 samuela

We're using a tmp directory to build the lib cache before moving it to the definitive cache dir. See https://github.com/numtide/nix-gl-host/blob/main/src/nixglhost.py#L558

So potentially, in your case, the tmpdir is created in your /run/user/1000 directory, which is rather small (1.6G). Copying the libs there saturates the tmpfs, hence the error. This tmpdir is then deleted when nix-gl-host exits, emptying it again.

I guess a potential fix would be to check we have enough available space in the tmpfs, and if we don't, use another directory in ~/.cache (XDG_CACHE_HOME).

picnoir avatar Feb 09 '24 07:02 picnoir

Ah, gotcha. thanks for explaining @picnoir ! Is there any particular reason to use /run/user/1000 instead of /tmp? Also why copy libs instead of symlinking them? Copying seems slower and more error-prone?

samuela avatar Feb 09 '24 14:02 samuela

Is there any particular reason to use /run/user/1000 instead of /tmp?

We're using TemporaryDirectory, which in turn uses the mkdtemp glibc function, which in turn will use your $TMPDIR var to figure out where to store the temporary directories.

I assume you could try to set $TMPDIR to /tmp to get this behavior.

Also why copy libs instead of symlinking them? Copying seems slower and more error-prone?

We need to patch their rpath. More details there: https://github.com/numtide/nix-gl-host/blob/main/INTERNALS.md#a-hard-problem-to-solve-and-a-partial-fix

picnoir avatar Feb 09 '24 14:02 picnoir

ahhh gotcha, ok thanks!

samuela avatar Feb 09 '24 16:02 samuela

export TMPDIR=/tmp worked for me.

leonardschneider avatar Mar 11 '24 22:03 leonardschneider