RoseTTAFold icon indicating copy to clipboard operation
RoseTTAFold copied to clipboard

Cuda out of memory error

Open ryao-mdanderson opened this issue 3 years ago • 5 comments

Hello RoseTTAFold team,

Thank you for sharing the code and supporting the community.

I test the code on our HPC cluster, refer to README documentation, and hit a CUDA out of memory error.

$ python network/predict_complex.py -i example/complex_modeling/filtered.a3m -o complex -Ls 218 31 RuntimeError: CUDA out of memory. Tried to allocate 9.14 GiB (GPU 0; 15.78 GiB total capacity; 7.04 GiB already allocated; 6.08 GiB free; 8.36 GiB reserved in total by PyTorch)

From google search, find an article https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch; I tried the following, it does not help. import torch torch.cuda.empty_cache()

Attempt to try the following, but I don't understand what are the 'variables' ? Should I try this on command line or embed it in the python code? import gc del variables gc.collect()

If you have any suggestion, I much appreciate!

ryao-mdanderson avatar Jul 27 '21 22:07 ryao-mdanderson

In my case, I can only run predict_complex.py at NVIDIA V100X or higher, even V100 will fail with similar error. Which type of GPU you were using? V100X has 32 GB memory but V100 only has 16 GB.

qiyubio avatar Jul 30 '21 18:07 qiyubio

@qiyubio 👍 You are right! I have to use a GPU node with gpu mem 32G (V100X). My initial test is with a GPU memory 16G, caused cuda out of memory.

ryao-mdanderson avatar Jul 30 '21 18:07 ryao-mdanderson

@qiyubio 👍 You are right! I have to use a GPU node with gpu mem 32G (V100X). My initial test is with a GPU memory 16G, caused cuda out of memory.

@ryao-mdanderson Hi, I am using a RTX3090 with 24GB of video memory, however I am running the pyrosetta script for the single example with the following error (in network.stderr): RuntimeError: CUDA out of memory. Tried to allocate 7.86 GiB (GPU 0; 23.70 GiB total capacity; 12.40 GiB already allocated; 491.69 MiB free; 21.51 GiB reserved in total by PyTorch) Do you have any suggestions for me, thank you very much!

275145 avatar Feb 08 '22 03:02 275145

@275145 my guess is your 24GB gpu memory is not enough for this application. "RuntimeError: CUDA out of memory" also suggests this. hope this helps.

ryao-mdanderson avatar Feb 08 '22 04:02 ryao-mdanderson

@275145 my guess is your 24GB gpu memory is not enough for this application. "RuntimeError: CUDA out of memory" also suggests this. hope this helps.

@ryao-mdanderson I even just used the 138 amino acids from the example to run the pyrosetta script from RoseTTAFold. According to the report, rosettafold is user-friendly and an RTX2080 graphics card can model 400 amino acid sequences in just ten minutes. In short, I think my device is sufficient or even better for it, but the lack of video memory makes it very difficult to understand

275145 avatar Feb 08 '22 04:02 275145