nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

NVIDIA repository to list did not update to 20.04

Open vikashg opened this issue 3 years ago • 7 comments

1. Issue or feature description

When I follow the steps depicted here, I am not able to update the nvidia-docker sources to 20.04. The sources still show

deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) / #deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) / deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) / #deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) / deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

2. Steps to reproduce the issue

the Ubuntu release is Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

nvidia-smi shows | NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 |

NVIDIA-GPU used A4000

Followed the steps here

vikashg avatar Dec 29 '21 15:12 vikashg

Seeing the same and unable to get runtime=nvidia working

Alec-Schneider avatar Jan 02 '22 02:01 Alec-Schneider

Same here. I followed NVIDIA's installation guide (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#nvidia-drivers).

/etc/apt/sources.list.d/nvidia-docker.list includes

deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

trying to install nvidia-docker2 results in following error:

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-docker2 : Depends: nvidia-container-toolkit (>= 1.7.0-1) but 1.5.1-1pop1~1627998766~20.04~9847cf2 is to be installed
E: Unable to correct problems, you have held broken packages.

Altering the sources list from .../ubuntu18.04/... to .../ubuntu20.04/... or to .../debian10/... didn't help.

RealTehreal avatar Jan 03 '22 13:01 RealTehreal

Firstly, nvidia-docker2 should be backward compatible - 18.04 should work in ubuntu 20.04, 21.04 and 21.10, your issues lie somewhere else in my opinion @RealTehreal you have to temporarily disable pop repositories in pop_shop if you want to use nvidia-docker2 straight from nvidia and not from system76, uncheck (disable, not remove) all pop repositories in extra sources, run sudo apt update and then install nvidia-docker2.

PQLLUX avatar Jan 10 '22 10:01 PQLLUX

Hi @vikashg. The packages used on Ubuntu 18.04 are also applicable to Ubuntu 20.04. We provide a top-level symlink to the .list file, but do not change the contents in this case. The file as shown is correct.

As @PQLLUX points out, it seems that you are also pulling these packages from a different repository and would have to ensure that the NVIDIA repository (specifically https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH)) has a higher priority than other repositories on the system.

elezar avatar Jan 10 '22 15:01 elezar

So, we resolved this issue. What we found that we ordered our workstation from Lambda systems and they had a wrapper on top which was leading to the issue mentioned by @PQLLUX. Also here is a nice discussion which helped

vikashg avatar Jan 11 '22 21:01 vikashg

When following the apt repo enabling step to deploy to x86-64 Ubuntu 20.04, the 18.04 is used. There exists a 20.04 repo. Is this expected or did the 18.04 repo config file get deployed to 20.04 by mistake? When I manually change ubuntu18.04 to ubuntu20.04 in the apt repo config file, I'm able to hit a 20.04 specific repo.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ echo $distribution
ubuntu20.04
$ curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | head -1
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
$ sudo apt-get update | grep libnvidia
Hit:9 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease
$ sudo sed -i 's|18.04|20.04|' /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ sudo apt-get update | grep libnvidia
Hit:10 https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64  InRelease

qhaas avatar Mar 26 '22 17:03 qhaas

@qhaas as mentioned in https://github.com/NVIDIA/nvidia-docker/issues/1584#issuecomment-1009011055 this presence of the 18.04 string in the repo list for ubuntu20.04 is correct. At present the packages are shared between these two distributions and we only publish a single copy, relying on redirects to serve them from both: https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 and https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64.

elezar avatar Mar 28 '22 05:03 elezar