deep-learning-containers
deep-learning-containers copied to clipboard
Request for help with cuda enabled pytorch
Hi
I want to know which instance type I should choose for this purpose. I had tried p2, p4 and g4, g5 but didn't work for me. I am trying to install cellbender (https://github.com/broadinstitute/CellBender). In one of the step I need to install pytorch (https://pytorch.org/get-started/locally/). So I can use the --cuda option.
After installation I should get
python3 import torch print(torch.cuda.is_available()) TRUE
But this is not working for me.
I have checked for resources online: https://stackoverflow.com/questions/60987997/why-torch-cuda-is-available-returns-false-even-after-installing-pytorch-with (I could not get enough information on type of nvidia graphic card on aws) and https://dziganto.github.io/aws/cuda/deep%20learning/gpu/python/pytorch/CUDA-on-AWS-for-Deep-Learning/ (seems like a very old publication didn't work for me).
Can you please help. Thanks.
@hemantgujar,
Hi there! Just noticing you provided some interesting information, but none of it really works as provided and I don't know if you posted in the correct location. Thought I might be able to help out though.
This repo makes Container for Deep Learning to run on AWS. Looks like you needed PyTorch. Which version? Are you trying to train a model or serve it? In the Case of using this repo, let's go training. Should be able to follow the directions in the README to log into the ECR and pull an image to start with.
Guessing Image you might want would be 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2
. This is copy/paste from available_images.md
Based on what I see from the dockerfile in this repo, it will have CUDA 11.8 and conda. Which means, you should only pip install -e CellBender
if I'm reading the instructions from the CellBlender correctly. Most the work is already done for you with the image listed above.
Another part I noticed is that you could use CellBlender's image us.gcr.io/broad-dsde-methods/cellbender:latest
instead.
As far as EC2 Instance types. I would recommend P3/G4 and above. You do need to verify the Nvidia driver is installed in the AMI you select, as well as nvidia docker toolkit (Deep Learning GPU AMIs will have drivers installed and there is a PyTorch version). When you spin the docker image up, ensure you use --gpus=all
and you "should" start working.
Again, this is my personal experience and I hope it helps you.