ansible-role-nvidia-driver icon indicating copy to clipboard operation
ansible-role-nvidia-driver copied to clipboard

Impossible to install drivers on fresh Ubuntu focal

Open BarthV opened this issue 3 years ago • 1 comments

Ansible role fails with :

TASK [nvidia.nvidia_driver : add key] ok: [gpu1]

TASK [nvidia.nvidia_driver : add repo] fatal: [gpu1]: FAILED! => { "changed": false, "msg": "Failed to update apt cache: E:Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/by-hash/SHA256/751939d95516afc289908a19e447f0acc1506367f72ed356431a2b1a469cc8ca 404 Not Found [IP: 152.199.20.126 443], E:Some index files failed to download. They have been ignored, or old ones used instead."}

$ sudo apt-key list
/etc/apt/trusted.gpg
--------------------
pub   rsa4096 2016-06-24 [SC]
      AE09 FE4B BD22 3A84 B2CC  FCE3 F60F 4B3D 7FA2 AF80
uid           [ unknown] cudatools <[email protected]>
$ sudo apt-get update
[...]
Ign:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release [697 B]
Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release.gpg [836 B]
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages
Err:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages
  404  Not Found [IP: 152.199.20.126 443]
Fetched 836 B in 2s (392 B/s)
Reading package lists... Done
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/by-hash/SHA256/751939d95516afc289908a19e447f0acc1506367f72ed356431a2b1a469cc8ca  404  Not Found [IP: 152.199.20.126 443]
E: Some index files failed to download. They have been ignored, or old ones used instead.

It seems that the whole by-hash dir is missing on nvidia repos : https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/by-hash/

image

As a workaround I had to change the the apt_repository task on my side adding a special by-hash=no option to avoid previous error :

$ cat /etc/apt/sources.list.d/developer_download_nvidia_com_compute_cuda_repos_ubuntu2004_x86_64.list 
deb [by-hash=no] http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /

Is it a transient issue on nvidia repo side ? Or should we permanently add this "no-hash" fix on ansible templates ?

Thanks

BarthV avatar Aug 31 '21 09:08 BarthV

Hi, I had the same issue the solution is there https://github.com/NVIDIA/ansible-role-nvidia-driver/pull/42

You have to use

https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /

instead of

http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /

quicky2000 avatar Sep 01 '21 14:09 quicky2000