Cesc comments

Results 32 comments of


                                            Cesc

quetion about the loss and grad of "mbr"

> @Cescfangs yes Thanks for the reply, and I'm curious about the improvement of this mWER tuning, say 5% relative wer reduction?

quetion about the loss and grad of "mbr"

https://github.com/hirofumi0810/neural_sp/blob/2b10b9cc4bdecb5180ecc45575c0ef410fb09aa3/neural_sp/models/criterion.py#L12-L39 Also, I am a little confused about the “mbr” loss, the inputs are not used in backward function, how does the grad flow to model params?

what's the intention of mocha first layers?

streaming decoder

what's the intention of mocha first layers?

OK, but this will take sometime

what's the intention of mocha first layers?

So I train the MMA decoder with very small dataset(1000utts), the perfermance may not be good, but I think it's effective for debugging, here is the acc plot and the...

what's the intention of mocha first layers?

I see, so what's the intention behind `mocha_first_layers` indeed?

what's the intention of mocha first layers?

Thank you, I'll re-check my implementation

what's the intention of mocha first layers?

I re-checked the plot_attention part, found it was not plotting the attention weight between encoder-decoder, the x-axis is actuall attention dim(256), I ll update attention plots later

what's the intention of mocha first layers?

here I give the last layer encoder-decoder attention plots since the `mocha_first_layer - 1` layers have no `src_att`: **Espnet default non-stream transformer decoder:** ![decoder decoders 5 src_attn 1ep](https://user-images.githubusercontent.com/11382612/84633240-ad550300-af22-11ea-88e3-be234d70f211.png) **MMA decoder,...

what's the intention of mocha first layers?

* the plots above from very small dataset(1000utts) for 1 epoch(for fast debugging), the perfermance may not be good, but I think it's effective for debugging * I also trained...