NeMo Cuda Out of memory

trafficstars

Hi,

I am trying to train a conformer-transducer model on LibriSpeech but I am getting out of memory error. I am using the conformer_transducer_bpe.yaml config to initialize a EncDecRNNTBPEModel class (also tried the char version) with batch_size=8 on a NVIDIA K80 GPU with 12GB of memory and this is the error I am getting:

`...

File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/nemo/collections/asr/modules/rnnt.py", line 992, in joint
res = self.joint_net(inp) # [B, T, U, V + 1] File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ec2-user/anaconda3/envs/nemo/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: CUDA out of memory. Tried to allocate 2.16 GiB (GPU 0; 11.17 GiB total capacity; 8.88 GiB already allocated; 1.76 GiB free; 9.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

I managed to train a EncDecCTCModelBPE model with much larger number of parameters on the same dataset with a larger batch size but I am not sure why I am getting memory error for a smaller model.