warp-transducer There is a problem about training a RNN-T model?

There is a problem about training a RNN-T model?

Open scufan1990 opened this issue 3 years ago • 4 comments

Hi, There is a problem about training a conformer+RNN-T model. How about the cer and wer with one GPU?

I'm train the model on one RTX TITAN GPU, training the conformer(encoder layers 16, encoder dim 144, decoder layer 1, decoder dim 320) on Librispeech 960h., after 50 epoch training the CER is about 27 and don't reduce anymore.

could you tell me why？

Dec 15 '21 14:12 scufan1990

Hello, We have a similar implementation within speechbrain (implemented with Python-Numba), you can take a look https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transducer transducer implementation: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/nnet/loss/transducer_loss.py

otherwise, you have the torchaudio supporting this lib within torchaudio 1.0, check: https://pytorch.org/audio/stable/functional.html#rnnt-loss

we are working on supporting the torchaudio within speechbrain as well see: https://github.com/speechbrain/speechbrain/pull/1199

Dec 15 '21 15:12 aheba

Hi, thank you! I will try it later.

Dec 16 '21 06:12 flp1990

There is also an implementation at https://github.com/csukuangfj/optimized_transducer that uses less GPU memory.

Jan 11 '22 07:01 csukuangfj

Hi @scufan1990 , did you resolve the issue ? i have some training in which the rnnt loss stop reducing.

Aug 02 '22 14:08 yufang67

warp-transducer warp-transducer copied to clipboard

There is a problem about training a RNN-T model?

warp-transducer
warp-transducer copied to clipboard