Cesc
Cesc
> @Cescfangs yes Thanks for the reply, and I'm curious about the improvement of this mWER tuning, say 5% relative wer reduction?
https://github.com/hirofumi0810/neural_sp/blob/2b10b9cc4bdecb5180ecc45575c0ef410fb09aa3/neural_sp/models/criterion.py#L12-L39 Also, I am a little confused about the “mbr” loss, the inputs are not used in backward function, how does the grad flow to model params?
streaming decoder
OK, but this will take sometime
So I train the MMA decoder with very small dataset(1000utts), the perfermance may not be good, but I think it's effective for debugging, here is the acc plot and the...
I see, so what's the intention behind `mocha_first_layers` indeed?
Thank you, I'll re-check my implementation
I re-checked the plot_attention part, found it was not plotting the attention weight between encoder-decoder, the x-axis is actuall attention dim(256), I ll update attention plots later
here I give the last layer encoder-decoder attention plots since the `mocha_first_layer - 1` layers have no `src_att`: **Espnet default non-stream transformer decoder:**  **MMA decoder,...
* the plots above from very small dataset(1000utts) for 1 epoch(for fast debugging), the perfermance may not be good, but I think it's effective for debugging * I also trained...