icefall Add convrnnt.py

Add convrnnt.py

Open yfyeung opened this issue 2 years ago • 1 comments

Add ConvRNN-T Encoder ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition https://arxiv.org/pdf/2209.14868.pdf

model size: 44M

The best WER on LibriSpeech 960h within 20 epoch is: epoch-20 avg-4 modified_beam_search beam-size-4 use-averaged-model

--	test-clean	test-other
WER	5.01	11.92

Nov 08 '22 05:11 yfyeung

Model Clean Other Size (M) RNN-T 5.9 15.71 30 Conformer 5.7 14.24 29 ContextNet 6.02 14.42 28 ConvRNN-T 5.11 13.82 29

The WER shown in the paper seems a lot worse than the original papers of conformer/contextnet etc. Any idea why is that?

Dec 13 '22 07:12 wangtiance

Model Clean Other Size (M) RNN-T 5.9 15.71 30 Conformer 5.7 14.24 29 ContextNet 6.02 14.42 28 ConvRNN-T 5.11 13.82 29

The WER shown in the paper seems a lot worse than the original papers of conformer/contextnet etc. Any idea why is that?

I can't reproduce Google's setup. So, I have no no idea.

Sep 13 '23 06:09 yfyeung

icefall icefall copied to clipboard

Add convrnnt.py

icefall
icefall copied to clipboard