model-angelo cuda not recognized

After installing v1.0.5, I am getting a message saying cuda is not recognized:

/opt/mamba/envs/model_angelo/lib/python3.10/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling

I am running this on an Nvidia v100 node, capable of handling recent CUDA libraries and drivers. Is there anything that can be done to force pytorch to recognize cuda and the GPU devices available on the node?

Oct 10 '23 18:10 davidhoover

Hi,

What may work is for you to try installing pytorch with the GPU like the following. Please let me know if you have further issues. This should be in the same conda environment.

python -mpip install torch torchvision torchaudio

If you continue having issues, you can try the following conda command too

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Oct 11 '23 12:10 jamaliki

That did not work. I figured it out. Can you be more specific in your installation statement? This worked:

pytorch==2.0.0=py3.10_cuda11.8_cudnn8.7.0_0 torchvision torchaudio cudatoolkit pytorch-cuda=11.8 -c pytorch -c conda-forge -c nvidia

Oct 12 '23 21:10 davidhoover

Wow @davidhoover thanks so much for figuring this out. May I ask where you found this? I want to see if the part with pytorch=2.0.0...etc is the important bit or if it's adding -c conda-forge and cudatoolkit explicitly

Oct 12 '23 21:10 jamaliki

I'm not sure if cudatoolkit needs to be explicit. I figured this out by trial-and-error, guided by this running this test on a GPU node:

python -c 'import torch; print(torch.cuda.is_available())'

If the result is False, then model-angelo will not utilize gpus. If True, then we're all good.

Oct 12 '23 21:10 davidhoover

model-angelo model-angelo copied to clipboard

cuda not recognized

model-angelo
model-angelo copied to clipboard