nix-gl-host
nix-gl-host copied to clipboard
"No space left on device" but I have plenty of space left on device
I was getting this weird error:
[nix-shell:~/nixpkgs]$ nixglhost ipython
Traceback (most recent call last):
File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 681, in <module>
ret = main(args)
File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 629, in main
new_env = nvidia_main(cache_dir, host_dsos_paths, args.print_ld_library_path)
File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 564, in nvidia_main
cache_paths.append(cache_library_path(p, tmp_cache_dir, cache_dir))
File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 449, in cache_library_path
copy_and_patch_libs(dsos=dsos, dest_dir=d, rpath=rpath_lib_dir)
File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 331, in copy_and_patch_libs
shutil.copyfile(dso.fullpath, newpath)
File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 267, in copyfile
_fastcopy_sendfile(fsrc, fdst)
File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 156, in _fastcopy_sendfile
raise err from None
File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 142, in _fastcopy_sendfile
sent = os.sendfile(outfd, infd, offset, blocksize)
OSError: [Errno 28] No space left on device: '/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.545.23.08' -> '/run/user/1000/tmpzgwq37ed/nix-gl-host/3076b0246cb1468199a8444860ebaebe5ec5081f85098057a9f4d5b40c3de738/lib/libnvidia-eglcore.so.545.23.08'
[nix-shell:~/nixpkgs]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 31G 25G 6.8G 79% /
devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs 7.7G 0 7.7G 0% /dev/shm
tmpfs 3.1G 924K 3.1G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/loop0 25M 25M 0 100% /snap/amazon-ssm-agent/7628
/dev/loop1 56M 56M 0 100% /snap/core18/2812
/dev/loop2 64M 64M 0 100% /snap/core20/2105
/dev/loop3 87M 87M 0 100% /snap/lxd/26881
/dev/loop4 87M 87M 0 100% /snap/lxd/26975
/dev/loop5 41M 41M 0 100% /snap/snapd/20671
/dev/nvme0n1p15 105M 6.1M 99M 6% /boot/efi
tmpfs 1.6G 20K 1.6G 1% /run/user/1000
Oddly, it seems to have gone away now, despite not having changed my configuration. Not sure what might be going on, or how to reproduce consistently, but hoping that creating this issue will help to spur discussion.
We're using a tmp directory to build the lib cache before moving it to the definitive cache dir. See https://github.com/numtide/nix-gl-host/blob/main/src/nixglhost.py#L558
So potentially, in your case, the tmpdir is created in your /run/user/1000 directory, which is rather small (1.6G). Copying the libs there saturates the tmpfs, hence the error. This tmpdir is then deleted when nix-gl-host exits, emptying it again.
I guess a potential fix would be to check we have enough available space in the tmpfs, and if we don't, use another directory in ~/.cache (XDG_CACHE_HOME).
Ah, gotcha. thanks for explaining @picnoir ! Is there any particular reason to use /run/user/1000 instead of /tmp? Also why copy libs instead of symlinking them? Copying seems slower and more error-prone?
Is there any particular reason to use /run/user/1000 instead of /tmp?
We're using TemporaryDirectory, which in turn uses the mkdtemp glibc function, which in turn will use your $TMPDIR var to figure out where to store the temporary directories.
I assume you could try to set $TMPDIR to /tmp to get this behavior.
Also why copy libs instead of symlinking them? Copying seems slower and more error-prone?
We need to patch their rpath. More details there: https://github.com/numtide/nix-gl-host/blob/main/INTERNALS.md#a-hard-problem-to-solve-and-a-partial-fix
ahhh gotcha, ok thanks!
export TMPDIR=/tmp worked for me.