distrobox icon indicating copy to clipboard operation
distrobox copied to clipboard

[Error] Nvidia integration mounts files as read-only that prevent installing some packages inside the container.

Open Butakus opened this issue 1 year ago • 1 comments

Describe the bug I found this problem when trying to install the libboost1.74-dev package in a Ubuntu 22.04 container with a Ubuntu 24.04 host created with nvidia integration (--nvidia flag).

When creating a container with the --nvidia flag, distrobox-init will mount as read-only all files under /usr/ that have "nvidia" in the name.

In my case, I had the libboost1.83-dev package installed in the host (Ubuntu 24.04), which installs some header files under /usr/include/boost/... with the string "nvidia" in the file name, that are unrelated to the nvidia driver or libraries. The problem appears in the container (Ubuntu 22.04) if I try to install boost (libboost1.74-dev), because it will try to overwrite those header files that were mounted as read-only. This is the error from apt:

Unpacking libboost1.74-dev:amd64 (1.74.0-14ubuntu3) ...
dpkg: error processing archive /tmp/apt-dpkg-install-Zcdiwt/18-libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb (--unpack):
unable to make backup link of './usr/include/boost/compute/detail/nvidia_compute_capability.hpp' before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)

I believe those files should not be mounted in the container, even if they have "nvidia" in the file name.

It is possible to pass an option to dpkg to ignore those files, but this is not a solution, because the files installed in the host and the container are from different versions of boost.

To Reproduce

  1. [Prerequisite] Have a host distro with boost installed. I only tested this with Ubuntu 24.04 in the host and 22.04 in the container, but it should be possible to replicate it with other distros.
  2. Create a new box with Nvidia integration:
distrobox create --image docker.io/library/ubuntu:22.04 --name boost_test --nvidia
  1. Try to install boost libraries inside the container:
distrobox enter boost_test
apt install -y libboost1.74-dev

It should not be possible to install because of files mounted by distrobox-init.

Expected behavior When using the --nvidia flag, the files unrelated to the nvidia driver / libraries should not be mounted in the container.

It should be possible to install boost in the container, even if boost is already installed in the host. This issue might also happen with different packages (not only boost) if they have any file with "nvidia" in the file name.

Desktop (please complete the following information):

  • Are you using podman, docker or lilipod? Which version or podman, docker or lilipod? podman 4.9.3
  • Which version of distrobox? 1.7.0
  • Which host distribution? Ubuntu 24.04
  • How did you install distrobox? From Ubuntu's universe repository through apt.

Additional context I am aware of issue #1054, but this is different, as I am not trying to install cuda or other Nvidia library, just the boost libraries. I think it might not be a good idea to mount all files that are found with "nvidia" in the name, but I have zero idea about the best way to solve this.

Butakus avatar Jul 29 '24 15:07 Butakus

Hi @Butakus

The same situation than you . . . The same host and the same image in container.

Preparing to unpack .../libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb ...
Unpacking libboost1.74-dev:amd64 (1.74.0-14ubuntu3) ...
dpkg: error processing archive /var/cache/apt/archives/libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb (--unpack):
 unable to make backup link of './usr/include/boost/compute/detail/nvidia_compute_capability.hpp' before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/libboost1.74-dev_1.74.0-14ubuntu3_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

juanscelyg avatar Aug 02 '24 17:08 juanscelyg

I think we might add something like .gitignore to explicitly exclude the files with the same semantics as the gitignore in a certain conf file like (.distrobox-ignore) in user's config directory. I think that would make sense if we want to exclude some host files instead of globbing everything "nvidia".

ToolmanP avatar Sep 10 '24 14:09 ToolmanP

I've written a patch to address with a new option

--nvidia-exclude:	directories to be excluded when nvidia is integrated (e.g. /usr/share/cmake)

I think you guys can use --nvidia-exclude /usr/include/boost to exclude boost from nvidia import.

ToolmanP avatar Sep 11 '24 15:09 ToolmanP

Thanks for the patch, looks like a valid workaround. Does this mean that we will have to include the --nvidia-exclude flag on every distrobox enter command?

Also, as a side question, Would it be possible to implement this the opposite way? I mean, finding what files are created/used by the nvidia driver/libraries and selecting those nvidia files and directories for mounting, instead of using find to grab all "nvidia" files. No one should expect in the default configuration to get some random library mounted in the container because it has files with the "nvidia" name in it. I don't know if these files and directories are consistent between distributions and/or Nvidia versions, which would make this approach impossible.

Butakus avatar Sep 19 '24 10:09 Butakus

Does this mean that we will have to include the --nvidia-exclude flag on every distrobox enter command?

No actually, it only happens on the creation of the container. In fact, I failed to find command to change the mount options after creation for podman.

ToolmanP avatar Sep 23 '24 05:09 ToolmanP

I don't know if these files and directories are consistent between distributions and/or Nvidia versions, which would make this approach impossible.

I also thought about this before, maybe we can take advantage of the host's package manager if possible, but the problem is that Nvidia Driver installation has also manual installation without using package manager, i don't know whether we can get a filelist from that driver.

ToolmanP avatar Sep 23 '24 05:09 ToolmanP

Also, as a side question, Would it be possible to implement this the opposite way? I mean, finding what files are created/used by the nvidia driver/libraries and selecting those nvidia files and directories for mounting, instead of using find to grab all "nvidia" files. No one should expect in the default configuration to get some random library mounted in the container because it has files with the "nvidia" name in it.

However, only a small portion of packages maybe get conflicts with nvidia names, and i deem that maybe we can manually display a widget and print all the directories and let the users exclude the directories is a better way to handle this issue.

ToolmanP avatar Sep 23 '24 05:09 ToolmanP

@89luca89 Hi, The issue is not completely fixed. The same issue happens with cmake-data on Fedora when a box is created with the --nvidia flag.

distrobox create --image fedora:latest --name fedora-dev --nvidia --home ~/.distrobox/fedora-dev
distrobox enter fedora-dev
sudo dnf install cmake
Error unpacking rpm package cmake-data-3.28.2-1.fc40.noarch
  Installing       : cmake-3.28.2-1.fc40.x86_64                                                                                                                                                                                 11/11 
error: unpacking of archive failed on file /usr/share/cmake/Modules/Compiler/NVIDIA-CUDA.cmake;670a5d86: cpio: rename failed - Device or resource busy
error: cmake-data-3.28.2-1.fc40.noarch: install failed

2aecfff4 avatar Oct 12 '24 11:10 2aecfff4

Thanks @2aecfff4 I've pushed a new commit that should fix all of these occurrencies

89luca89 avatar Oct 12 '24 13:10 89luca89

@89luca89 I can confirm that it works perfectly now. Thank you!

2aecfff4 avatar Oct 12 '24 13:10 2aecfff4

I'm not entirely sure that it's related but I'm having a similar issue.

Trying to install cuda in an arch container (with Fedora 41 host system) I get:

Package (4)          New Version                  Net Change 

extra/gcc13          13.3.1+r432+gfc8bd63119c0-1   155.52 MiB
extra/gcc13-libs     13.3.1+r432+gfc8bd63119c0-1     0.70 MiB
extra/opencl-nvidia  565.77-3                       42.03 MiB
extra/cuda           12.6.3-1                     4907.70 MiB

Total Installed Size:  5105.95 MiB

:: Proceed with installation? [Y/n] y
(4/4) checking keys in keyring                                                                                                                                                                [#######################################################################################################################] 100%
(4/4) checking package integrity                                                                                                                                                              [#######################################################################################################################] 100%
(4/4) loading package files                                                                                                                                                                   [#######################################################################################################################] 100%
(4/4) checking for file conflicts                                                                                                                                                             [#######################################################################################################################] 100%
error: failed to commit transaction (conflicting files)
opencl-nvidia: /usr/lib/libnvidia-opencl.so.1 exists in filesystem
opencl-nvidia: /usr/lib/libnvidia-opencl.so.565.77 exists in filesystem

This happens both with --nvidia and with the nvidia-container-toolkit (that I tried just now). The box was created with this distrobox-create --init --home "/home/$USER/arch-home" --name "arch" --image quay.io/toolbx/arch-toolbox:latest --additional-packages "base-devel git" --additional-flags "--runtime=nvidia --gpus all" but it also happened on the regular archlinux:latest image.

Vodes avatar Jan 06 '25 12:01 Vodes