NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Cuda Out of memory

Open sepehr3pehr opened this issue 3 years ago • 3 comments
trafficstars

Hi,

I am trying to train a conformer-transducer model on LibriSpeech but I am getting out of memory error. I am using the conformer_transducer_bpe.yaml config to initialize a EncDecRNNTBPEModel class (also tried the char version) with batch_size=8 on a NVIDIA K80 GPU with 12GB of memory and this is the error I am getting:

`...

File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/nemo/collections/asr/modules/rnnt.py", line 992, in joint
res = self.joint_net(inp) # [B, T, U, V + 1] File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: CUDA out of memory. Tried to allocate 2.16 GiB (GPU 0; 11.17 GiB total capacity; 8.88 GiB already allocated; 1.76 GiB free; 9.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

I managed to train a EncDecCTCModelBPE model with much larger number of parameters on the same dataset with a larger batch size but I am not sure why I am getting memory error for a smaller model.

sepehr3pehr avatar Jul 26 '22 20:07 sepehr3pehr

You might need to use a smaller batch sizes of 4 and use grad accumulation instead. RNNT model takes much more memory than CTC

titu1994 avatar Jul 27 '22 03:07 titu1994

You may reduce the fused_batch_size to reduce the memory consumption.

VahidooX avatar Jul 27 '22 20:07 VahidooX

Batch of size of 2 does not throw an error. I think I have to stick to that. Thanks.

sepehr3pehr avatar Jul 29 '22 19:07 sepehr3pehr

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Sep 28 '22 02:09 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Oct 06 '22 02:10 github-actions[bot]