openspeech icon indicating copy to clipboard operation
openspeech copied to clipboard

ContextNet-Transcducer: TypeError: forward() got an unexpected keyword argument 'input_lengths'

Open yunigma opened this issue 3 years ago • 8 comments

Hello! I am trying to run the contextnet_transcducer now with:

nohup python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path="../..//database/LibriSpeech/" dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_subword_manifest.txt" tokenizer=libri_subword model=contextnet_transducer audio=fbank lr_scheduler=warmup trainer=gpu criterion=cross_entropy

Yet, I am failing to start the training with the following error:

    self.advance(*args, **kwargs)
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
    output = self._evaluation_step(batch, batch_idx, dataloader_idx)
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
    output = self.trainer.accelerator.validation_step(step_kwargs)
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 239, in validation_step
    return self.training_type_plugin.validation_step(*step_kwargs.values())
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/dp.py", line 104, in validation_step
    return self.model(*args, **kwargs)
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/overrides/data_parallel.py", line 63, in forward
    output = super().forward(*inputs, **kwargs)
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 92, in forward
    output = self.module.validation_step(*inputs, **kwargs)
  File "/remote/idiap.svm/temp.speech05/inigmatulina/work/experiments/contextnet/openspeech/openspeech/models/openspeech_transducer_model.py", line 258, in validation_step
    return self.collect_outputs(
  File "/remote/idiap.svm/temp.speech05/inigmatulina/work/experiments/contextnet/openspeech/openspeech/models/openspeech_transducer_model.py", line 94, in collect_outputs
    loss = self.criterion(
  File "/idiap/temp/inigmatulina/code/miniconda3/envs/ve-openspeech/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'input_lengths' 

I have checked the openspeech/decoders/rnn_transducer_decoder.py script and, in the forward function, the input_lengths is passed as an argument but never used after in the function... I am not sure how it should be...

Thank you! Yulia

yunigma avatar Feb 13 '22 21:02 yunigma

@upskyy

sooftware avatar Feb 14 '22 04:02 sooftware

Hello @yunigma ! I think there is an error using the cross_entropy criterion for the transducer model. Would you like to use the criterion as transducer?

upskyy avatar Feb 14 '22 09:02 upskyy

Hello @upskyy ! Thank you very much for your response. Change cross_entropy to transducer helped to fix the issue reported above. I managed to start training, yet the training loss is very high (>200) and it keeps growing. I have tried to use a different lr_scheduler but it seems that the problem is not in it. I attach the logs. logs_20220214.log

yunigma avatar Feb 14 '22 11:02 yunigma

Well, Would you like to experiment by modifying the gradient accumulation parameter? [link] I think your batch size is 16, so it might be a good idea to set accumulate_grad_batches to 8.

upskyy avatar Feb 15 '22 02:02 upskyy

Hello, @upskyy ! I have tried setting accumulate_grad_batches to 8 but it made loss grow even faster...

yunigma avatar Feb 15 '22 13:02 yunigma

@yunigma I'll have to do some more testing. I'm so sorry... 😭

upskyy avatar Feb 16 '22 12:02 upskyy

Thank you @upskyy !! No worries. It is a very cool project anyway. I keep trying to understand the issue on my side too.

yunigma avatar Feb 16 '22 12:02 yunigma

Capture d’écran 2022-04-06 à 11 00 23

Same issue here with hparams :

    dataset=librispeech \
    tokenizer=libri_subword \
    model=contextnet_transducer \
    audio=fbank \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=transducer \
    trainer.sampler=smart \
    trainer.batch_size=4 \
    trainer.accumulate_grad_batches=8

virgile-blg avatar Apr 06 '22 09:04 virgile-blg