icefall icon indicating copy to clipboard operation
icefall copied to clipboard

[WIP] RNN-T + MBR training.

Open pkufool opened this issue 3 years ago • 3 comments

This PR depends on https://github.com/k2-fsa/k2/pull/1057 in k2.

pkufool avatar Sep 29 '22 03:09 pkufool

The model structure is like the diagram below, it has two joiners, one is the joiner for regular RNN-T, the other is quasi-joiner that produces the expected wer. To make the quasi-joiner work well, we use an Enhanced embedding instead of the Encoder output. The Embedding enhancer is some kind of model that has self-attention from masked_encoder_output and cross-attention from text_embedding produced by a tranformer LM.

image

pkufool avatar Dec 08 '22 07:12 pkufool

@danpovey @yaozengwei @glynpu Would you please to have a look at this, if there is anything unclear, please let me know. Thanks!

pkufool avatar Dec 08 '22 08:12 pkufool

Sure. I will have a look.

yaozengwei avatar Dec 08 '22 08:12 yaozengwei