boltz icon indicating copy to clipboard operation
boltz copied to clipboard

protein length vs. GPU memory

Open tommyhuangthu opened this issue 1 year ago • 15 comments

Dear team, thanks for releasing the code of this powerful tool.

I tested it on a few complex targets of my interest; the results are very promising. However, it failed interfering very large protein targets. One of my targets is a homodimer, with 2100 AAs in total. Another target has >5000 AAs. We have A40 40 GB GPUs on our cluster. Have your team benchmarked the GPU memory consumption vs. protein length?

tommyhuangthu avatar Nov 19 '24 01:11 tommyhuangthu

Hi! We are currently preparing a low memory mode that should be able to run the 2100AA, the 5000 might be more difficult. We'll report back with what this new feature allows very soon!

jwohlwend avatar Nov 19 '24 01:11 jwohlwend

Hi! We are currently preparing a low memory mode that should be able to run the 2100AA, the 5000 might be more difficult. We'll report back with what this new feature allows very soon!

Thank you for the quick response. Looking forward to seeing the update!

tommyhuangthu avatar Nov 19 '24 03:11 tommyhuangthu

Does --devices option help the memory issue? For example, the ligand.fasta case filed on my machine with RTX 4090. I was trying to use --devices 2 (my machine has two RTX 4090), but it failed. I also reported a issue (#51) for --devices option for a smaller case for which a single GPU device can handle well.

jiaboli007 avatar Nov 23 '24 19:11 jiaboli007

Any updates on limiting the memory @jwohlwend? I'm trying to run on a system with 4644 AAs. Ideally, I'd also like to be able to run on a system with 4548 AAs and a SMILES representing several copies of an 8 residue peptide as well. I have a 4xA100, and I'm running out of memory with systems half that size.

Here is my YAML:

version: 1  # Optional, defaults to 1
sequences:
  - protein:
      id: [A,B,C,D,E,F]
      sequence: MGDWSALGRLLDKVQAYSTAGGKVWLSVLFIFRILLLGTAVESAWGDEQSAFVCNTQQPGCENVCYDKSFPISHVRFWVLQIIFVSTPTLLYLAHVFYLMRKEEKLNRKEEELKMVQNEGGNVDMHLKQIEIKKFKYGLEEHGKVKMRGGLLRTYIISILFKSVFEVGFIIIQWYMYGFSLSAIYTCKRDPCPHQVDCFLSRPTEKTIFIWFMLIVSIVSLALNIIELFYVTYKSIKDGIKGKKDPFSATNDAVISGKECGSPKYAYFNGCSSPTAPMSPPGYKLVTGERNPSSCRNYNKQASEQNWANYSAEQNRMGQAGSTISNTHAQPFDFSDEHQNTKKMAPGHEMQPLTILDQRPSSRASSHASSRPRPDDLEI
  - protein:
      id: [G,H,I,J,K,L]
      sequence: FSLESERP
  # - ligand:
  #     id: [G,H,I,J,K,L]
  #     smiles: CC(C)C[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1CCC[C@H]1C(=O)O

I'm running it as follows:

(boltz-1) lily@il-gpu04:~/amelie/Workspace/boltz$ boltz predict examples/connexin_FSLESERP.yaml --recycling_steps 20  --diffusion_samples 10 --use_msa_server
Checking input data.
Running predictions for 1 structure
Processing input data.
  0%|                                                                                                                                                        | 0/1 [00:00<?, ?it/s]Generating MSA for examples/Cx43_Xenopus_laevis_MGTFEEVP.yaml with 2 protein entities.
COMPLETE: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [elapsed: 00:01 remaining: 00:00]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.55s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/lily/mambaforge/envs/boltz-1/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Predicting DataLoader 0:   0%|                                                                                                                               | 0/1 [00:00<?, ?it/s]| 
WARNING: ran out of memory, skipping batch
Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  0.41it/s]
Number of failed examples: 1
Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  0.41it/s]

amelie-iska avatar Nov 24 '24 04:11 amelie-iska

I am experiencing the same issue. It seems that the chunking feature is working now. Could you provide a tutorial on how to use it? I have updated Boltz-1 to version 0.3.0, but it still runs out of memory when predicting a large protein on A100-80GB. Below is the input file I used:

version: 1
sequences:
  - protein:
      id: [A, B, C, D]
      sequence: MPVRRGHVAPQNTFLDTIIRKFEGQSRKFIIANARVENCAVIYCNDGFCELCGYSRAEVMQRPCTCDFLHGPRTQRRAAAQIAQALLGAEERKVEIAFYRKDGSCFLCLVDVVPVKNEDGAVIMFILNFEVVMEKDMVGSSPTSDREIIAPKIKERTHNVTEKVTQVLSLGADVLPEYKLQAPRIHRWTILHYSPFKAVWDWLILLLVIYTAVFTPYSAAFLLKETEEGPPATECGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWYAIGNMEQPHMDSRIGWLHNLGDQIGKPYNSSGLGGPSIKDKYVTALYFTFSSLTSVGFGNVSPNTNSEKIFSICVMLIGSLMYASIFGNVSAIIQRLYSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQADICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGSIEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYPEFSDHFWSSLEITFNLRDTNMIPGGRQYQELPRCPAPTPSLLNIPLSSPGRRPRGDVESRLDALQRQLNRLETRLSADMATVLQLLQRQMTLVPPAYSAVTTPGPGPTSTSPLLPVSPLPTLTLDSLSQVSQFMACEELPPGAPELPQEGPTRRLSLPGQLGALTSQPLHRHGSDPGSLEVLFQ
  - ligand:
      id: [E]
      smiles: O=C1NCCN1CCN1CCC(c2cn(-c3ccc(F)cc3)c3ccc(Cl)cc23)CC1

xiaolinpan avatar Nov 29 '24 03:11 xiaolinpan

I am facing the same problem with even a small protein (193 aa) with a small RNA (55 nt.)

version: 1
sequences:
  - protein:
      id: [A]
      sequence: SNMDGRIVIVDDEPITRLDIRDIVIEAGYEVVGEAADGFEAIEVCKKTQPDLVLMDIQMPILDGLKAGKKIVQDQLASSIVFLSAYSDVQNTDKAKKLGALGYLVKPLDEKSLIPTIEMSIERGKQTQLLLSQIDKLSLKLEERKIIEKAKGILVKENHISEEEAYQMLRTLSMNKRARMSEIAELIVMDDE
      msa: ./boltz_results_6ww6/msa/tmp_env/uniref.a3m
  - rna:
      id: [B]
      sequence: AUCGGCAAAGGAGCCCA

and the output is

(boltz) shirehorse@shirehorse-Precision-3460:~/Desktop/learning_stuff/boltz$ boltz predict 6ww6.yaml --recycling_steps 10 --diffusion_samples 1
Checking input data.
Running predictions for 1 structure
Processing input data.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.11it/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/shirehorse/miniconda3/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA RTX A2000 12GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0:   0%|                                                                                                         | 0/1 [00:00<?, ?it/s]| WARNING: ran out of memory, skipping batch
Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.08it/s]Number of failed examples: 1
Predicting DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.07it/s]

Jameel9 avatar Dec 02 '24 04:12 Jameel9

Hi, in v0.3.2 we pushed a few memory improvements. Could you test whether this issue still occurs?

gcorso avatar Dec 12 '24 14:12 gcorso

It still occurs for me.

version: 1
sequences:
  - protein:
      id: [A, B, C, D]
      sequence: MPVRRGHVAPQNTFLDTIIRKFEGQSRKFIIANARVENCAVIYCNDGFCELCGYSRAEVMQRPCTCDFLHGPRTQRRAAAQIAQALLGAEERKVEIAFYRKDGSCFLCLVDVVPVKNEDGAVIMFILNFEVVMEKDMVGSSPTSDREIIAPKIKERTHNVTEKVTQVLSLGADVLPEYKLQAPRIHRWTILHYSPFKAVWDWLILLLVIYTAVFTPYSAAFLLKETEEGPPATECGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWYAIGNMEQPHMDSRIGWLHNLGDQIGKPYNSSGLGGPSIKDKYVTALYFTFSSLTSVGFGNVSPNTNSEKIFSICVMLIGSLMYASIFGNVSAIIQRLYSGTARYHTQMLRVREFIRFHQIPNPLRQRLEEYFQHAWSYTNGIDMNAVLKGFPECLQADICLHLNRSLLQHCKPFRGATKGCLRALAMKFKTTHAPPGDTLVHAGDLLTALYFISRGSIEILRGDVVVAILGKNDIFGEPLNLYARPGKSNGDVRALTYCDLHKIHRDDLLEVLDMYPEFSDHFWSSLEITFNLRDTNMIPGGRQYQELPRCPAPTPSLLNIPLSSPGRRPRGDVESRLDALQRQLNRLETRLSADMATVLQLLQRQMTLVPPAYSAVTTPGPGPTSTSPLLPVSPLPTLTLDSLSQVSQFMACEELPPGAPELPQEGPTRRLSLPGQLGALTSQPLHRHGSDPGSLEVLFQ
  - ligand:
      id: [E]
      smiles: O=C1NCCN1CCN1CCC(c2cn(-c3ccc(F)cc3)c3ccc(Cl)cc23)CC1

xiaolinpan avatar Dec 12 '24 15:12 xiaolinpan

Hi @gcorso,

Thanks for the update. It works! I actually requested 2 protein chains and 2 RNA chains of this task (PDB ID: 6WW6), and computationally, it worked on my GPU (NVIDIA RTX A2000 12GB). However, the predicted structure of the monomer (1 protein + 1 RNA) was closer to the solved structure. The model of 2 proteins 2 RNAs was not meaningful.

In case you want to reproduce the results I got, you just need to use the short RNA sequence actually solved not the fasta sequence in the PDB page. I updated the sequence in the previous comment as well.

Jameel9 avatar Dec 12 '24 22:12 Jameel9

Hi @xiaolinpan, unfortunately at the moment we do not support the predictions of structures with >> 2000 residues/tokens on regular GPUs, the structure in your examples has above 3200 residues

gcorso avatar Dec 18 '24 16:12 gcorso

Hi @Jameel9, this is unfortunate. We are actively working on improving the quality of the model on protein + RNA complexes!

gcorso avatar Dec 18 '24 16:12 gcorso

Hi @xiaolinpan, unfortunately at the moment we do not support the predictions of structures with >> 2000 residues/tokens on regular GPUs, the structure in your examples has above 3200 residues

Thank you for your reply. I also try to use alphafold3 to predict this structure, it can generate for this large protein. Maybe alphafold3 made some optimization for memory usage.

xiaolinpan avatar Dec 18 '24 16:12 xiaolinpan

Yes, it's definitely possible that they made some extra optimizations. We are working on improving these, but we definitely welcome contributions from the community!

gcorso avatar Dec 27 '24 16:12 gcorso

@xiaolinpan We just wrapped up testing our Boltz-1 implementation on H200s (~140GB VRAM) and successfully folded proteins up to 3,500 AAs in length.

If you're interested, you can try it out on our webserver here: https://neurosnap.ai/service/Boltz-1%20(AlphaFold3).

Full disclosure, I am affiliated with Neurosnap.

KeaunAmani avatar Feb 27 '25 06:02 KeaunAmani

Hi is this issue still present? I am trying to do structure prediction for a tetramer, one sequence is 1159 AA in length, therefore it is 4636 AA in total. Is this too large? I am running into memory issues.

khuddzu avatar Nov 26 '25 02:11 khuddzu