model-angelo
model-angelo copied to clipboard
ModelAngelo not using GPUs (2080 Ti, CUDA 11.4)
Hi everyone,
I installed modelangelo on our server (as described in README), and noticed that Ca building took ~5 hours, and GPU utilization was 0 at that time. After that, I ran python manually, and checked torch availability:
>>> import torch
torch.cuda.is_available()
>>> torch.cuda.is_available()
False
>>>
which seems to be the reason. CUDA, however, seems to be working fine on the server:
$ nvcc --vesrion
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0
(also for other packages, e.g. relion/cryosparc/you name it).
Could you please point me to what I should do to fix that?
Hi,
Thank you for your report. This is interesting and should not be happening. I think something may have gone wrong in the torch installation. Could you please run this command with the after activating conda environment.
conda install -y pytorch torchvision torchaudio cudatoolkit=11.4 -c pytorch
I don't know why this would be an issue, but maybe it is the slight mismatch in the CUDA version.
I ran into this issue as well. I don't think conda is managing the dependencies for torch + cuda correctly at installation. It is not installing the required "torch_cuda*.so" libraries, just the libraries for CPU.
I was able to work around this by using mamba instead of conda which seems to handle the dependencies correctly.
Interesting! @jasonkey do you have the diffs you made to the installation script somewhere?
@jamaliki 11.4 is not yet available in conda:
PackagesNotFoundError: The following packages are not available from current channels:
- cudatoolkit=11.4
Current channels:
- https://conda.anaconda.org/pytorch/linux-64
- https://conda.anaconda.org/pytorch/noarch
- https://repo.anaconda.com/pkgs/main/linux-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/linux-64
- https://repo.anaconda.com/pkgs/r/noarch
No, I just run the installation manually without using the script.
conda install mamba
mamba install cudatoolkit=11.3 pytorch torchvision torchaudio -c pytorch
worked for me and installed the missing "libtorch_cuda*.so" libraries.
I happen to still have it in my scrollback. In this case I downgraded pytorch intentionally, but you can see that the packages conda included are the cpu packages. These are replaced with the correct cuda versions with mamba.
- pytorch 1.13.0 py3.9_cpu_0
+ pytorch 1.12.1 py3.9_cuda11.3_cudnn8.3.2_0
- torchaudio 0.13.0 py39_cpu
+ torchaudio 0.12.1 py39_cu113
- torchvision 0.14.0 py39_cpu
+ torchvision 0.13.1 py39_cu113
I don't know why conda and mamba behave differently here.
@jasonkey I tried mamba -- didn't work for me either, surprisingly.
Namely, I did:
conda install mamba # btw, this one took sooo long
mamba install cudatoolkit=11.3 pytorch torchvision torchaudio -c pytorch
and then still have this:
$ python
>>> import torch
torch.cuda.is_available()
>>> torch.cuda.is_available()
False
>>>
Hmpf. This is really strange. Are you able to install PyTorch with GPU normally?
Hmpf. This is really strange. Are you able to install PyTorch with GPU normally?
Yes. If I install fresh virtual environment, I can see that CUDA is available:
$ python3 -m venv venv
$ source venv/bin/activate
$ # source: pytorch documentation: https://pytorch.org/get-started/locally/
(venv) $ python3 -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu114
# long output
(venv) $ python3
>>> import torch
>>> torch.cuda.is_available()
True
Interesting, does it just not work with the conda install? Maybe that is the issue
hi, where can I find position to choose the GPU?
hi, where can I find position to choose the GPU?
You can specify the GPU with the --device flag. If you type model_angelo --help it should give you all of the options.
Interesting, does it just not work with the conda install? Maybe that is the issue
Sorry, I didn't quite understand what you mean here :)
Ok, so it seems that there's no need for conda here -- installation with python venv works just fine:
$ # in model_angelo github folder
$ python3 -m venv env
$ source env/bin/activate
(env) $ python3 -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu114
(env) $ python3 -m pip install -r requirements.txt
(env) $ python3 setup.py install
That's awesome! Yeah that's what I meant :)
It is strange that the conda install did not work, but I'm glad you were able to install it anyway!
Ah, my bad -- after installing into virtual environment, I got:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED
when model_angelo entered the GNN refinement stage (I assume because pip couldn't install cudatoolkit properly).
After few hours, the solution that worked was this:
$ conda create -n model_angelo python=3.9 -y
$ conda activate model_angelo
(model_angelo) $ conda install -y pytorch pytorch-cuda=11.6 torchvision torchaudio cudatoolkit=11.6 -c nvidia -c pytorch
(model_angelo) $ python3 -m pip install -r requirements.txt
(model_angelo) $ python3 setup.py install
(model_angelo) $ export TORCH_HOME=/path/to/weights
(model_angelo) $ conda env config vars set TORCH_HOME="$TORCH_HOME"
(model_angelo) $ conda deactivate && conda activate model_angelo # necessary to enable TORCH_HOME in current session
after that, model_angelo works as expected (at least goes into GNN refinement stage, which wasn't happening before).
Interesting, so was it the cudatoolkit=11.6?
Yep. There's no 11.4 on pytorch channel, and also pytorch kept being installed without GPU support.
I needed to specifically ask for pytorch-cuda (and hence -c nvidia
), and also for higher cudatoolkit version for compatibility with pytorch (since pytorch-cuda=11.3 is afair unavailable there).
Thank you this is very useful. I will test to see if this change works on our cluster and then I will push it to the repo!
My gpu also can't work at that moment by following the readme.
@zhihao-2022 could you make a new issue and add some information about how you installed the program, what kind of machine with what kind of operating system you have, and also whether pytorch is able to see the GPU?
Hello @jamaliki ,
I have similar issue -CUDA is not available - but this fix doesn't work for me.
So more in the details - the modelangelo runs but without cuda. (I have cuda 11.8; Centos 7)
The error
(model_angelo) [caroline@lvx0862 model-angelo]$ model_angelo build -v /home/caroline/Documents/Phenix/new_heptamer_extraction_mask__with__best__J1427/classes_of__heptamers__from_cl_J1514__from__J1513/J1631__cl0/cryosparc_P4_J1631_007_volume_map.mrc -pf /home/caroline/Documents/Phenix/sequence/P11076.fasta -o output ---------------------------- ModelAngelo ----------------------------- By Kiarash Jamali, Scheres Group, MRC Laboratory of Molecular Biology --------------------- Initial C-alpha prediction --------------------- 0%| | 0/9261 [00:00<?, ?it/s]/home/caroline/miniconda3/envs/model_angelo/lib/python3.9/site-packages/torch/amp/autocast_mode.py:250: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling warnings.warn( /home/caroline/miniconda3/envs/model_angelo/lib/python3.9/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( ^Z [2]+ Stopped
- Reistallation of torch with cuda goes fine. But the error is still the same.
- Command import torch leads to nowhere
(model_angelo) [caroline@lvx0862 model-angelo]$ python Python 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.
import torch torch.cuda.is_available() False
Moreover - when I type import torch command inside modelangelo conda env it doesn't respond but the cross appears instead of cursor. Until I click the mouse - nothing changes. When I click the mouse - the command disappeares.
Could you please help?
thank you.
sincerely, Dmitry
Hi @DmitrySemchonok ,
Are you able to verify that the server has access to GPUs?
When you install torch alone in an environment, does torch.cuda.is_available()
give you True
?
If that is the case, could you try installing torch with CUDA and then installing ModelAngelo with the new pip install command and let me know if you still have issues?