protein length vs. GPU memory
Dear team, thanks for releasing the code of this powerful tool.
I tested it on a few complex targets of my interest; the results are very promising. However, it failed interfering very large protein targets. One of my targets is a homodimer, with 2100 AAs in total. Another target has >5000 AAs. We have A40 40 GB GPUs on our cluster. Have your team benchmarked the GPU memory consumption vs. protein length?
Hi! We are currently preparing a low memory mode that should be able to run the 2100AA, the 5000 might be more difficult. We'll report back with what this new feature allows very soon!
Hi! We are currently preparing a low memory mode that should be able to run the 2100AA, the 5000 might be more difficult. We'll report back with what this new feature allows very soon!
Thank you for the quick response. Looking forward to seeing the update!
Does --devices option help the memory issue? For example, the ligand.fasta case filed on my machine with RTX 4090. I was trying to use --devices 2 (my machine has two RTX 4090), but it failed. I also reported a issue (#51) for --devices option for a smaller case for which a single GPU device can handle well.
Any updates on limiting the memory @jwohlwend? I'm trying to run on a system with 4644 AAs. Ideally, I'd also like to be able to run on a system with 4548 AAs and a SMILES representing several copies of an 8 residue peptide as well. I have a 4xA100, and I'm running out of memory with systems half that size.
Here is my YAML:
version: 1 # Optional, defaults to 1
sequences:
- protein:
id: [A,B,C,D,E,F]
sequence: MGDWSALGRLLDKVQAYSTAGGKVWLSVLFIFRILLLGTAVESAWGDEQSAFVCNTQQPGCENVCYDKSFPISHVRFWVLQIIFVSTPTLLYLAHVFYLMRKEEKLNRKEEELKMVQNEGGNVDMHLKQIEIKKFKYGLEEHGKVKMRGGLLRTYIISILFKSVFEVGFIIIQWYMYGFSLSAIYTCKRDPCPHQVDCFLSRPTEKTIFIWFMLIVSIVSLALNIIELFYVTYKSIKDGIKGKKDPFSATNDAVISGKECGSPKYAYFNGCSSPTAPMSPPGYKLVTGERNPSSCRNYNKQASEQNWANYSAEQNRMGQAGSTISNTHAQPFDFSDEHQNTKKMAPGHEMQPLTILDQRPSSRASSHASSRPRPDDLEI
- protein:
id: [G,H,I,J,K,L]
sequence: FSLESERP
# - ligand:
# id: [G,H,I,J,K,L]
# smiles: CC(C)C[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1CCC[C@H]1C(=O)O
I'm running it as follows:
(boltz-1) lily@il-gpu04:~/amelie/Workspace/boltz$ boltz predict examples/connexin_FSLESERP.yaml --recycling_steps 20 --diffusion_samples 10 --use_msa_server
Checking input data.
Running predictions for 1 structure
Processing input data.
0%| | 0/1 [00:00<?, ?it/s]Generating MSA for examples/Cx43_Xenopus_laevis_MGTFEEVP.yaml with 2 protein entities.
COMPLETE: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [elapsed: 00:01 remaining: 00:00]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.55s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/lily/mambaforge/envs/boltz-1/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Predicting DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]|
WARNING: ran out of memory, skipping batch
Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 0.41it/s]
Number of failed examples: 1
Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 0.41it/s]
I am experiencing the same issue. It seems that the chunking feature is working now. Could you provide a tutorial on how to use it? I have updated Boltz-1 to version 0.3.0, but it still runs out of memory when predicting a large protein on A100-80GB. Below is the input file I used:
version: 1
sequences:
- protein:
id: [A, B, C, D]
sequence: MPVRRGHVAPQNTFLDTIIRKFEGQSRKFIIANARVENCAVIYCNDGFCELCGYSRAEVMQRPCTCDFLHGPRTQRRAAAQIAQALLGAEERKVEIAFYRKDGSCFLCLVDVVPVKNEDGAVIMFILNFEVVMEKDMVGSSPTSDREIIAPKIKERTHNVTEKVTQVLSLGADVLPEYKLQAPRIHRWTILHYSPFKAVWDWLILLLVIYTAVFTPYSAAFLLKETEEGPPATECGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWYAIGNMEQPHMDSRIGWLHNLGDQIGKPYNSSGLGGPSIKDKYVTALYFTFSSLTSVGFGNVSPNTNSEKIFSICVMLIGSLMYASIFGNVSAIIQRLYSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQADICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGSIEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYPEFSDHFWSSLEITFNLRDTNMIPGGRQYQELPRCPAPTPSLLNIPLSSPGRRPRGDVESRLDALQRQLNRLETRLSADMATVLQLLQRQMTLVPPAYSAVTTPGPGPTSTSPLLPVSPLPTLTLDSLSQVSQFMACEELPPGAPELPQEGPTRRLSLPGQLGALTSQPLHRHGSDPGSLEVLFQ
- ligand:
id: [E]
smiles: O=C1NCCN1CCN1CCC(c2cn(-c3ccc(F)cc3)c3ccc(Cl)cc23)CC1
I am facing the same problem with even a small protein (193 aa) with a small RNA (55 nt.)
version: 1
sequences:
- protein:
id: [A]
sequence: SNMDGRIVIVDDEPITRLDIRDIVIEAGYEVVGEAADGFEAIEVCKKTQPDLVLMDIQMPILDGLKAGKKIVQDQLASSIVFLSAYSDVQNTDKAKKLGALGYLVKPLDEKSLIPTIEMSIERGKQTQLLLSQIDKLSLKLEERKIIEKAKGILVKENHISEEEAYQMLRTLSMNKRARMSEIAELIVMDDE
msa: ./boltz_results_6ww6/msa/tmp_env/uniref.a3m
- rna:
id: [B]
sequence: AUCGGCAAAGGAGCCCA
and the output is
(boltz) shirehorse@shirehorse-Precision-3460:~/Desktop/learning_stuff/boltz$ boltz predict 6ww6.yaml --recycling_steps 10 --diffusion_samples 1
Checking input data.
Running predictions for 1 structure
Processing input data.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.11it/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/shirehorse/miniconda3/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA RTX A2000 12GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]| WARNING: ran out of memory, skipping batch
Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3.08it/s]Number of failed examples: 1
Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3.07it/s]
Hi, in v0.3.2 we pushed a few memory improvements. Could you test whether this issue still occurs?
It still occurs for me.
version: 1
sequences:
- protein:
id: [A, B, C, D]
sequence: MPVRRGHVAPQNTFLDTIIRKFEGQSRKFIIANARVENCAVIYCNDGFCELCGYSRAEVMQRPCTCDFLHGPRTQRRAAAQIAQALLGAEERKVEIAFYRKDGSCFLCLVDVVPVKNEDGAVIMFILNFEVVMEKDMVGSSPTSDREIIAPKIKERTHNVTEKVTQVLSLGADVLPEYKLQAPRIHRWTILHYSPFKAVWDWLILLLVIYTAVFTPYSAAFLLKETEEGPPATECGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWYAIGNMEQPHMDSRIGWLHNLGDQIGKPYNSSGLGGPSIKDKYVTALYFTFSSLTSVGFGNVSPNTNSEKIFSICVMLIGSLMYASIFGNVSAIIQRLYSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQADICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGSIEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYPEFSDHFWSSLEITFNLRDTNMIPGGRQYQELPRCPAPTPSLLNIPLSSPGRRPRGDVESRLDALQRQLNRLETRLSADMATVLQLLQRQMTLVPPAYSAVTTPGPGPTSTSPLLPVSPLPTLTLDSLSQVSQFMACEELPPGAPELPQEGPTRRLSLPGQLGALTSQPLHRHGSDPGSLEVLFQ
- ligand:
id: [E]
smiles: O=C1NCCN1CCN1CCC(c2cn(-c3ccc(F)cc3)c3ccc(Cl)cc23)CC1
Hi @gcorso,
Thanks for the update. It works! I actually requested 2 protein chains and 2 RNA chains of this task (PDB ID: 6WW6), and computationally, it worked on my GPU (NVIDIA RTX A2000 12GB). However, the predicted structure of the monomer (1 protein + 1 RNA) was closer to the solved structure. The model of 2 proteins 2 RNAs was not meaningful.
In case you want to reproduce the results I got, you just need to use the short RNA sequence actually solved not the fasta sequence in the PDB page. I updated the sequence in the previous comment as well.
Hi @xiaolinpan, unfortunately at the moment we do not support the predictions of structures with >> 2000 residues/tokens on regular GPUs, the structure in your examples has above 3200 residues
Hi @Jameel9, this is unfortunate. We are actively working on improving the quality of the model on protein + RNA complexes!
Hi @xiaolinpan, unfortunately at the moment we do not support the predictions of structures with >> 2000 residues/tokens on regular GPUs, the structure in your examples has above 3200 residues
Thank you for your reply. I also try to use alphafold3 to predict this structure, it can generate for this large protein. Maybe alphafold3 made some optimization for memory usage.
Yes, it's definitely possible that they made some extra optimizations. We are working on improving these, but we definitely welcome contributions from the community!
@xiaolinpan We just wrapped up testing our Boltz-1 implementation on H200s (~140GB VRAM) and successfully folded proteins up to 3,500 AAs in length.
If you're interested, you can try it out on our webserver here: https://neurosnap.ai/service/Boltz-1%20(AlphaFold3).
Full disclosure, I am affiliated with Neurosnap.
Hi is this issue still present? I am trying to do structure prediction for a tetramer, one sequence is 1159 AA in length, therefore it is 4636 AA in total. Is this too large? I am running into memory issues.