icefall
icefall copied to clipboard
[WIP] RNN-T + MBR training.
This PR depends on https://github.com/k2-fsa/k2/pull/1057 in k2.
The model structure is like the diagram below, it has two joiners, one is the joiner for regular RNN-T, the other is quasi-joiner that produces the expected wer. To make the quasi-joiner work well, we use an Enhanced embedding instead of the Encoder output. The Embedding enhancer is some kind of model that has self-attention from masked_encoder_output and cross-attention from text_embedding produced by a tranformer LM.
@danpovey @yaozengwei @glynpu Would you please to have a look at this, if there is anything unclear, please let me know. Thanks!
Sure. I will have a look.