deep-learning-containers icon indicating copy to clipboard operation
deep-learning-containers copied to clipboard

[bug] apt update errors due to failing NVIDIA certificate verification

Open austinmw opened this issue 3 years ago • 4 comments

Checklist

  • [X] I've prepended issue tag with type of change: [bug]
  • [X] (If applicable) I've attached the script to reproduce the bug
  • [X] (If applicable) I've documented below the DLC image/dockerfile this relates to
  • [X] (If applicable) I've documented below the tests I've run on the DLC image
  • [X] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html

Concise Description: Unable to apt update SageMaker DLC's due to failing NVIDIA certificate verification

To reproduce: nvidia-docker run -it --rm 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker "apt update"

DLC image/dockerfile: Multiple, for example: 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker

Current behavior:

root@a7301cb95566:/# apt update
Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  InRelease                                                   
Err:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release                                                                 
  Certificate verification failed: The certificate is NOT trusted. The certificate issuer is unknown.  Could not handshake: Error in the certificate verification. [IP: 152.195.19.142 443]
Err:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  Release                                                     
  Certificate verification failed: The certificate is NOT trusted. The certificate issuer is unknown.  Could not handshake: Error in the certificate verification. [IP: 152.195.19.142 443]
Get:5 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]                                                                                           
Get:6 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]                                                     
Get:7 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu focal InRelease [23.8 kB]             
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]                              
Get:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]                                   
Get:11 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu focal/main amd64 Packages [16.5 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]           
Get:15 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1139 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1216 kB]    
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.3 kB]    
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2188 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1154 kB]
Get:20 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [25.8 kB]
Get:21 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1773 kB]         
Get:22 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [26.0 kB]   
Get:23 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [51.2 kB]          
Get:24 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [870 kB]    
Reading package lists... Done                              
W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/InRelease: No system certificates available. Try installing ca-certificates.
W: https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/InRelease: No system certificates available. Try installing ca-certificates.
W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/Release: No system certificates available. Try installing ca-certificates.
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/Release: No system certificates available. Try installing ca-certificates.
E: The repository 'https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

Expected behavior: Completing successfully

Additional context: This largely prevents using these images as a base to build on top of

austinmw avatar Apr 26 '22 18:04 austinmw

Having the same issue

amritap-ef avatar May 05 '22 12:05 amritap-ef

Thank you for reporting the issue! Please let us know if you are still facing the issue.

tejaschumbalkar avatar Aug 11 '22 00:08 tejaschumbalkar

Having the same issue with images with torch<=1.9, any updates on how to mitigate?

stefan-matcovici avatar Jan 17 '23 11:01 stefan-matcovici

You can give this a try in the docker file before apt-get update

# Workaround for CUDA Linux Repository Key Rotation
# https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub

public-git-ui avatar Apr 21 '23 15:04 public-git-ui