model-angelo
model-angelo copied to clipboard
cuda not recognized
After installing v1.0.5, I am getting a message saying cuda is not recognized:
/opt/mamba/envs/model_angelo/lib/python3.10/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
I am running this on an Nvidia v100 node, capable of handling recent CUDA libraries and drivers. Is there anything that can be done to force pytorch to recognize cuda and the GPU devices available on the node?
Hi,
What may work is for you to try installing pytorch with the GPU like the following. Please let me know if you have further issues. This should be in the same conda environment.
python -mpip install torch torchvision torchaudio
If you continue having issues, you can try the following conda command too
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
That did not work. I figured it out. Can you be more specific in your installation statement? This worked:
pytorch==2.0.0=py3.10_cuda11.8_cudnn8.7.0_0 torchvision torchaudio cudatoolkit pytorch-cuda=11.8 -c pytorch -c conda-forge -c nvidia
Wow @davidhoover thanks so much for figuring this out. May I ask where you found this? I want to see if the part with pytorch=2.0.0...etc
is the important bit or if it's adding -c conda-forge
and cudatoolkit
explicitly
I'm not sure if cudatoolkit needs to be explicit. I figured this out by trial-and-error, guided by this running this test on a GPU node:
python -c 'import torch; print(torch.cuda.is_available())'
If the result is False
, then model-angelo will not utilize gpus. If True
, then we're all good.