deep-learning-containers
                                
                                
                                
                                    deep-learning-containers copied to clipboard
                            
                            
                            
                        [bug] apt update errors due to failing NVIDIA certificate verification
Checklist
- [X] I've prepended issue tag with type of change: [bug]
 - [X] (If applicable) I've attached the script to reproduce the bug
 - [X] (If applicable) I've documented below the DLC image/dockerfile this relates to
 - [X] (If applicable) I've documented below the tests I've run on the DLC image
 - [X] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
 
Concise Description:
Unable to apt update SageMaker DLC's due to failing NVIDIA certificate verification
To reproduce:
nvidia-docker run -it --rm 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker "apt update"
DLC image/dockerfile:
Multiple, for example: 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker
Current behavior:
root@a7301cb95566:/# apt update
Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  InRelease                                                   
Err:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release                                                                 
  Certificate verification failed: The certificate is NOT trusted. The certificate issuer is unknown.  Could not handshake: Error in the certificate verification. [IP: 152.195.19.142 443]
Err:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  Release                                                     
  Certificate verification failed: The certificate is NOT trusted. The certificate issuer is unknown.  Could not handshake: Error in the certificate verification. [IP: 152.195.19.142 443]
Get:5 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]                                                                                           
Get:6 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]                                                     
Get:7 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu focal InRelease [23.8 kB]             
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]                              
Get:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]                                   
Get:11 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu focal/main amd64 Packages [16.5 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]           
Get:15 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1139 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1216 kB]    
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.3 kB]    
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2188 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1154 kB]
Get:20 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [25.8 kB]
Get:21 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1773 kB]         
Get:22 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [26.0 kB]   
Get:23 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [51.2 kB]          
Get:24 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [870 kB]    
Reading package lists... Done                              
W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/InRelease: No system certificates available. Try installing ca-certificates.
W: https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/InRelease: No system certificates available. Try installing ca-certificates.
W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/Release: No system certificates available. Try installing ca-certificates.
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/Release: No system certificates available. Try installing ca-certificates.
E: The repository 'https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
Expected behavior: Completing successfully
Additional context: This largely prevents using these images as a base to build on top of
Having the same issue
Thank you for reporting the issue! Please let us know if you are still facing the issue.
Having the same issue with images with torch<=1.9, any updates on how to mitigate?
You can give this a try in the docker file before apt-get update
# Workaround for CUDA Linux Repository Key Rotation
# https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub