warp-transducer icon indicating copy to clipboard operation
warp-transducer copied to clipboard

There is a problem about training a RNN-T model?

Open scufan1990 opened this issue 3 years ago • 4 comments

Hi, There is a problem about training a conformer+RNN-T model. How about the cer and wer with one GPU?

I'm train the model on one RTX TITAN GPU, training the conformer(encoder layers 16, encoder dim 144, decoder layer 1, decoder dim 320) on Librispeech 960h., after 50 epoch training the CER is about 27 and don't reduce anymore.

could you tell me why?

scufan1990 avatar Dec 15 '21 14:12 scufan1990

Hello, We have a similar implementation within speechbrain (implemented with Python-Numba), you can take a look https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transducer transducer implementation: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/nnet/loss/transducer_loss.py

otherwise, you have the torchaudio supporting this lib within torchaudio 1.0, check: https://pytorch.org/audio/stable/functional.html#rnnt-loss

we are working on supporting the torchaudio within speechbrain as well see: https://github.com/speechbrain/speechbrain/pull/1199

aheba avatar Dec 15 '21 15:12 aheba

Hi, thank you! I will try it later.

flp1990 avatar Dec 16 '21 06:12 flp1990

There is also an implementation at https://github.com/csukuangfj/optimized_transducer that uses less GPU memory.

csukuangfj avatar Jan 11 '22 07:01 csukuangfj

Hi @scufan1990 , did you resolve the issue ? i have some training in which the rnnt loss stop reducing.

yufang67 avatar Aug 02 '22 14:08 yufang67