[Error] Nvidia integration mounts files as read-only that prevent installing some packages inside the container.
Describe the bug
I found this problem when trying to install the libboost1.74-dev package in a Ubuntu 22.04 container with a Ubuntu 24.04 host created with nvidia integration (--nvidia flag).
When creating a container with the --nvidia flag, distrobox-init will mount as read-only all files under /usr/ that have "nvidia" in the name.
In my case, I had the libboost1.83-dev package installed in the host (Ubuntu 24.04), which installs some header files under /usr/include/boost/... with the string "nvidia" in the file name, that are unrelated to the nvidia driver or libraries. The problem appears in the container (Ubuntu 22.04) if I try to install boost (libboost1.74-dev), because it will try to overwrite those header files that were mounted as read-only. This is the error from apt:
Unpacking libboost1.74-dev:amd64 (1.74.0-14ubuntu3) ...
dpkg: error processing archive /tmp/apt-dpkg-install-Zcdiwt/18-libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb (--unpack):
unable to make backup link of './usr/include/boost/compute/detail/nvidia_compute_capability.hpp' before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
I believe those files should not be mounted in the container, even if they have "nvidia" in the file name.
It is possible to pass an option to dpkg to ignore those files, but this is not a solution, because the files installed in the host and the container are from different versions of boost.
To Reproduce
- [Prerequisite] Have a host distro with boost installed. I only tested this with Ubuntu 24.04 in the host and 22.04 in the container, but it should be possible to replicate it with other distros.
- Create a new box with Nvidia integration:
distrobox create --image docker.io/library/ubuntu:22.04 --name boost_test --nvidia
- Try to install boost libraries inside the container:
distrobox enter boost_test
apt install -y libboost1.74-dev
It should not be possible to install because of files mounted by distrobox-init.
Expected behavior
When using the --nvidia flag, the files unrelated to the nvidia driver / libraries should not be mounted in the container.
It should be possible to install boost in the container, even if boost is already installed in the host. This issue might also happen with different packages (not only boost) if they have any file with "nvidia" in the file name.
Desktop (please complete the following information):
- Are you using podman, docker or lilipod? Which version or podman, docker or lilipod? podman 4.9.3
- Which version of distrobox? 1.7.0
- Which host distribution? Ubuntu 24.04
- How did you install distrobox? From Ubuntu's universe repository through apt.
Additional context I am aware of issue #1054, but this is different, as I am not trying to install cuda or other Nvidia library, just the boost libraries. I think it might not be a good idea to mount all files that are found with "nvidia" in the name, but I have zero idea about the best way to solve this.
Hi @Butakus
The same situation than you . . . The same host and the same image in container.
Preparing to unpack .../libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb ...
Unpacking libboost1.74-dev:amd64 (1.74.0-14ubuntu3) ...
dpkg: error processing archive /var/cache/apt/archives/libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb (--unpack):
unable to make backup link of './usr/include/boost/compute/detail/nvidia_compute_capability.hpp' before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
/var/cache/apt/archives/libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
I think we might add something like .gitignore to explicitly exclude the files with the same semantics as the gitignore in a certain conf file like (.distrobox-ignore) in user's config directory. I think that would make sense if we want to exclude some host files instead of globbing everything "nvidia".
I've written a patch to address with a new option
--nvidia-exclude: directories to be excluded when nvidia is integrated (e.g. /usr/share/cmake)
I think you guys can use --nvidia-exclude /usr/include/boost to exclude boost from nvidia import.
Thanks for the patch, looks like a valid workaround. Does this mean that we will have to include the --nvidia-exclude flag on every distrobox enter command?
Also, as a side question, Would it be possible to implement this the opposite way? I mean, finding what files are created/used by the nvidia driver/libraries and selecting those nvidia files and directories for mounting, instead of using find to grab all "nvidia" files. No one should expect in the default configuration to get some random library mounted in the container because it has files with the "nvidia" name in it.
I don't know if these files and directories are consistent between distributions and/or Nvidia versions, which would make this approach impossible.
Does this mean that we will have to include the
--nvidia-excludeflag on everydistrobox entercommand?
No actually, it only happens on the creation of the container. In fact, I failed to find command to change the mount options after creation for podman.
I don't know if these files and directories are consistent between distributions and/or Nvidia versions, which would make this approach impossible.
I also thought about this before, maybe we can take advantage of the host's package manager if possible, but the problem is that Nvidia Driver installation has also manual installation without using package manager, i don't know whether we can get a filelist from that driver.
Also, as a side question, Would it be possible to implement this the opposite way? I mean, finding what files are created/used by the nvidia driver/libraries and selecting those nvidia files and directories for mounting, instead of using
findto grab all "nvidia" files. No one should expect in the default configuration to get some random library mounted in the container because it has files with the "nvidia" name in it.
However, only a small portion of packages maybe get conflicts with nvidia names, and i deem that maybe we can manually display a widget and print all the directories and let the users exclude the directories is a better way to handle this issue.
@89luca89
Hi,
The issue is not completely fixed.
The same issue happens with cmake-data on Fedora when a box is created with the --nvidia flag.
distrobox create --image fedora:latest --name fedora-dev --nvidia --home ~/.distrobox/fedora-dev
distrobox enter fedora-dev
sudo dnf install cmake
Error unpacking rpm package cmake-data-3.28.2-1.fc40.noarch
Installing : cmake-3.28.2-1.fc40.x86_64 11/11
error: unpacking of archive failed on file /usr/share/cmake/Modules/Compiler/NVIDIA-CUDA.cmake;670a5d86: cpio: rename failed - Device or resource busy
error: cmake-data-3.28.2-1.fc40.noarch: install failed
Thanks @2aecfff4 I've pushed a new commit that should fix all of these occurrencies
@89luca89 I can confirm that it works perfectly now. Thank you!
I'm not entirely sure that it's related but I'm having a similar issue.
Trying to install cuda in an arch container (with Fedora 41 host system) I get:
Package (4) New Version Net Change
extra/gcc13 13.3.1+r432+gfc8bd63119c0-1 155.52 MiB
extra/gcc13-libs 13.3.1+r432+gfc8bd63119c0-1 0.70 MiB
extra/opencl-nvidia 565.77-3 42.03 MiB
extra/cuda 12.6.3-1 4907.70 MiB
Total Installed Size: 5105.95 MiB
:: Proceed with installation? [Y/n] y
(4/4) checking keys in keyring [#######################################################################################################################] 100%
(4/4) checking package integrity [#######################################################################################################################] 100%
(4/4) loading package files [#######################################################################################################################] 100%
(4/4) checking for file conflicts [#######################################################################################################################] 100%
error: failed to commit transaction (conflicting files)
opencl-nvidia: /usr/lib/libnvidia-opencl.so.1 exists in filesystem
opencl-nvidia: /usr/lib/libnvidia-opencl.so.565.77 exists in filesystem
This happens both with --nvidia and with the nvidia-container-toolkit (that I tried just now).
The box was created with this
distrobox-create --init --home "/home/$USER/arch-home" --name "arch" --image quay.io/toolbx/arch-toolbox:latest --additional-packages "base-devel git" --additional-flags "--runtime=nvidia --gpus all"
but it also happened on the regular archlinux:latest image.