GPU Memory Limit issue
Currently running into GPU memory issues whilst trying to predict protein structure with the following traceback:
boltz predict TEST_PROTEIN.fasta --use_msa_server
Checking input data.
All inputs are already processed.
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/antigenteam/miniconda3/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:76: Starting from v1.9.0, tensorboardX has been removed as a dependency of the pytorch_lightning package, due to potential conflicts with other packages in the ML ecosystem. For this reason, logger=True will use CSVLogger as the default logger, unless the tensorboard or tensorboardX packages are found. Please pip install lightning[extra] or one of them to enable TensorBoard support by default
Running structure prediction for 1 input.
/home/antigenteam/miniconda3/lib/python3.12/site-packages/pytorch_lightning/utilities/migration/utils.py:56: The loaded checkpoint was produced with Lightning v2.5.0.post0, which is newer than your current Lightning version: v2.5.0
You are using a CUDA device ('NVIDIA GeForce RTX 4080 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]| WARNING: ran out of memory, skipping batch
Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 0.56it/s]Number of failed examples: 1
Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 0.56it/s]
Monitoring GPU memory I see the GPU memory spikes to 100% whilst running the first step. Is there a way to limit the memory required to avoid this?