ContextNet-Transcducer: TypeError: forward() got an unexpected keyword argument 'input_lengths'
Hello! I am trying to run the contextnet_transcducer now with:
nohup python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path="../..//database/LibriSpeech/" dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_subword_manifest.txt" tokenizer=libri_subword model=contextnet_transducer audio=fbank lr_scheduler=warmup trainer=gpu criterion=cross_entropy
Yet, I am failing to start the training with the following error:
self.advance(*args, **kwargs)
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 239, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/dp.py", line 104, in validation_step
return self.model(*args, **kwargs)
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/overrides/data_parallel.py", line 63, in forward
output = super().forward(*inputs, **kwargs)
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 92, in forward
output = self.module.validation_step(*inputs, **kwargs)
File "/remote/idiap.svm/temp.speech05/inigmatulina/work/experiments/contextnet/openspeech/openspeech/models/openspeech_transducer_model.py", line 258, in validation_step
return self.collect_outputs(
File "/remote/idiap.svm/temp.speech05/inigmatulina/work/experiments/contextnet/openspeech/openspeech/models/openspeech_transducer_model.py", line 94, in collect_outputs
loss = self.criterion(
File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'input_lengths'
I have checked the openspeech/decoders/rnn_transducer_decoder.py script and, in the forward function, the input_lengths is passed as an argument but never used after in the function... I am not sure how it should be...
Thank you! Yulia
@upskyy
Hello @yunigma !
I think there is an error using the cross_entropy criterion for the transducer model. Would you like to use the criterion as transducer?
Hello @upskyy !
Thank you very much for your response. Change cross_entropy to transducer helped to fix the issue reported above. I managed to start training, yet the training loss is very high (>200) and it keeps growing. I have tried to use a different lr_scheduler but it seems that the problem is not in it.
I attach the logs.
logs_20220214.log
Well, Would you like to experiment by modifying the gradient accumulation parameter? [link]
I think your batch size is 16, so it might be a good idea to set accumulate_grad_batches to 8.
Hello, @upskyy ! I have tried setting accumulate_grad_batches to 8 but it made loss grow even faster...
@yunigma I'll have to do some more testing. I'm so sorry... 😭
Thank you @upskyy !! No worries. It is a very cool project anyway. I keep trying to understand the issue on my side too.
Same issue here with hparams :
dataset=librispeech \
tokenizer=libri_subword \
model=contextnet_transducer \
audio=fbank \
lr_scheduler=warmup_reduce_lr_on_plateau \
trainer=gpu \
criterion=transducer \
trainer.sampler=smart \
trainer.batch_size=4 \
trainer.accumulate_grad_batches=8