mm-cot
mm-cot copied to clipboard
torch.cuda.OutOfMemoryError: CUDA out of memory.
GPU Info
$ nvidia-smi
Thu Feb 23 06:54:18 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 40C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
command to run
CUDA_VISIBLE_DEVICES=0 python main.py \
--model allenai/unifiedqa-t5-base \
--user_msg rationale --img_type detr \
--bs 8 --eval_bs 4 --eval_acc 10 --output_len 512 \
--final_eval --prompt_format QCM-LE
error message
[06:54:23] [Model]: Loading allenai/unifiedqa-t5-base... main.py:68
[Data]: Reading data... main.py:69
Some weights of T5ForMultimodalGeneration were not initialized from the model checkpoint at allenai/unifiedqa-t5-base and are newly initialized: ['mha_layer.out_proj.weight', 'image_dense.weight', 'mha_layer.in_proj_bias', 'image_dense.bias', 'mha_layer.in_proj_weight', 'gate_dense.bias', 'mha_layer.out_proj.bias', 'gate_dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
model parameters: 226643712
***** Running training *****
Num examples = 12726
Num Epochs = 20
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 31820
0%| | 0/31820 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/test/deploy/mm-cot/main.py", line 380, in <module>
T5Trainer(
File "/home/test/deploy/mm-cot/main.py", line 269, in T5Trainer
trainer.train()
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/trainer.py", line 1498, in train
return inner_training_loop(
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/trainer.py", line 2470, in training_step
loss = self.compute_loss(model, inputs)
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/trainer.py", line 2502, in compute_loss
outputs = model(**inputs)
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/test/deploy/mm-cot/model.py", line 144, in forward
decoder_outputs = self.decoder(
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1035, in forward
layer_outputs = layer_module(
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 692, in forward
cross_attention_outputs = self.layer[1](
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 606, in forward
attention_output = self.EncDecAttention(
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 535, in forward
attn_weights = nn.functional.dropout(
File "/home/test/deploy/mm-cot/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 1252, in dropout
return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 11.17 GiB total capacity; 10.70 GiB already allocated; 20.25 MiB free; 10.80 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%|
One possible solution to the OutOfMemoryError is to edit the parameters of "bs". It's in your startup shell:
--bs 8 --eval_bs 4
bs means "batch size", replacing it with a lower value may be helpful.