deep-learning-containers icon indicating copy to clipboard operation
deep-learning-containers copied to clipboard

[bug] Torch does not find GPU on pytorch-training:1.10.0-gpu-py38 container

Open sergii-ivakhno-kidsloop opened this issue 2 years ago • 0 comments

Concise Description:

Torch does not find Cuda on GPU instance and official SageMaker training container

DLC image/dockerfile:

sudo docker pull 763104351884.dkr.ecr.eu-west-2.amazonaws.com/pytorch-training:1.10.0-gpu-py38-cu113-ubuntu20.04-sagemaker

Current behavior:

sudo docker pull 763104351884.dkr.ecr.eu-west-2.amazonaws.com/pytorch-training:1.10.0-gpu-py38-cu113-ubuntu20.04-sagemaker
sudo docker run -it --entrypoint /bin/bash 709fa9395949
python -c "import torch; print(torch.cuda.is_available()) -> False"

Expected behavior:

python -c "import torch; print(torch.cuda.is_available())" -> True

Additional context:

The same outcome is seen on SageMaker Notebook instance ml.p3.2xlarge (docker pull from console) and EC2 instance p3.2xlarge

sergii-ivakhno-kidsloop avatar Mar 06 '22 15:03 sergii-ivakhno-kidsloop