warp-transducer
warp-transducer copied to clipboard
There is a problem about training a RNN-T model?
Hi, There is a problem about training a conformer+RNN-T model. How about the cer and wer with one GPU?
I'm train the model on one RTX TITAN GPU, training the conformer(encoder layers 16, encoder dim 144, decoder layer 1, decoder dim 320) on Librispeech 960h., after 50 epoch training the CER is about 27 and don't reduce anymore.
could you tell me why?
Hello, We have a similar implementation within speechbrain (implemented with Python-Numba), you can take a look https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transducer transducer implementation: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/nnet/loss/transducer_loss.py
otherwise, you have the torchaudio supporting this lib within torchaudio 1.0, check: https://pytorch.org/audio/stable/functional.html#rnnt-loss
we are working on supporting the torchaudio within speechbrain as well see: https://github.com/speechbrain/speechbrain/pull/1199
Hi, thank you! I will try it later.
There is also an implementation at https://github.com/csukuangfj/optimized_transducer that uses less GPU memory.
Hi @scufan1990 , did you resolve the issue ? i have some training in which the rnnt loss stop reducing.