CUDA out of memory

Open zzy221127 opened this issue 3 years ago • 1 comments

Dear author：

I run Fastfold in a 4 GPU device， each GPU have an 24GiB memory。

I run inference.py with an fasta lenght 1805AA (without triton), with parameter --gpus 3

and the error prints like:

RuntimeError: CUDA out of memory. Tried to allocate 29.26 GiB (GPU 0; 23.70 GiB total capacity; 9.63 GiB already allocated; 11.79 GiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

my questions is:

why there are only one GPU(GPU0 but not GPU0, GPU1,GPU2) used to calculate total memory? what should I do to get over this?
Is there a way to run an extremely long fasta files, like 4000AA?

appriciate your reply, thankyou.

Nov 28 '22 15:11 zzy221127

I think if you can check args.gpus in the code. It suppose to be 3 if you add parameter correctly.

Alphafold's embedding presentations take up a lot of memory as the sequence length increases. To reduce memory usage, you should add parameter --chunk_size [N] and --inplace to cmdline or shell script ./inference.sh. The smaller you set N, the less memory will be used, but it will affect the speed.

Nov 29 '22 01:11 Shenggan