icefall icon indicating copy to clipboard operation
icefall copied to clipboard

[Streaming] Conv emformer right context length

Open bethant9 opened this issue 3 years ago • 19 comments

Hi, I'm noticing some weird behaviour when altering the right context length with the conv emformer. Basically, increasing the right context length, while keeping chunk size constant, results in higher loss and worse WER. I can't understand why increasing future context would degrade performance? I notice that in your experiments the right context length is always 1/4 of the chunk size. Is there a reason for this i.e. does the right context length need to be fixed to a proportion of the chunk size? Many thanks in advance

bethant9 avatar Oct 13 '22 10:10 bethant9

For the ConvEmformer model, the chunk-length and right-context-length are both fixed during training. It is possible that we would get worse results if we use different chunk-length or right-context-length during decoding.

yaozengwei avatar Oct 13 '22 10:10 yaozengwei

Sorry, I should have clarified. I am training the model with a larger right context length (and then decoding with the same values)

bethant9 avatar Oct 13 '22 10:10 bethant9

Could you show the number you set for both experiments?

yaozengwei avatar Oct 13 '22 10:10 yaozengwei

32 chunk size + 8 right context length (default) vs 32 chunk size + 32 right context length The second experiment gives worse results so far

bethant9 avatar Oct 13 '22 10:10 bethant9

We have not tried such configuration of equal chunk-length and right-context-length. Maybe you could try chunk-length=32 and right-context-length=12 or 16, to see whether it would get improvement? You could also see the Emformer paper https://arxiv.org/pdf/2010.10759.pdf for details.

yaozengwei avatar Oct 13 '22 10:10 yaozengwei

I tried with right context length (RCL) = 16 - it was better than RCL=32, but still worse than RCL=8. It seems from my rough experiments that increasing RCL degrades performance

bethant9 avatar Oct 13 '22 11:10 bethant9

Just wondering, is the training input padded with the right_context_length?

bethant9 avatar Oct 13 '22 16:10 bethant9

Ok. Our experiments on streaming conformer trained with dynamic chunk size also shows that increasing the right context could not consistently get improvements during decoding.

yaozengwei avatar Oct 14 '22 03:10 yaozengwei

@bethant9 hi,can you upload wer files of different right context model?or can you tell the error types(S、D、I)of different models?

kobenaxie avatar Oct 28 '22 12:10 kobenaxie

@bethant9 Can you try to increase the the tail padding length in decode.py? https://github.com/k2-fsa/icefall/blob/6709bf1e6325166fcb989b1dbb03344d6b90b7f8/egs/librispeech/ASR/conv_emformer_transducer_stateless2/decode.py#L280 Maybe we need a larger tail padding length for the model with a larger right context length, to avoid lossing the tail features.

yaozengwei avatar Oct 28 '22 13:10 yaozengwei

Hi, I found that I needed to pad the training data with right context length frames - otherwise during training right context length frames are removed from the input which leads to incorrect training and higher WER

bethant9 avatar Oct 28 '22 13:10 bethant9

if not padding during training, is the model with longer right context leads to high Deletion Error ,especially at the end of sentence ?

kobenaxie avatar Oct 28 '22 14:10 kobenaxie

Yes exactly, ends of utterances aren't correctly trained so high deletion error

bethant9 avatar Oct 28 '22 14:10 bethant9

I solved this by padding with right context length just before the emformer

bethant9 avatar Oct 28 '22 14:10 bethant9

Congratulations! Today I also find that emformer will drop the last chunk, which may lead to deletion error at the end. Thanks for your reply ~

kobenaxie avatar Oct 28 '22 14:10 kobenaxie

Hi, I found that I needed to pad the training data with right context length frames - otherwise during training right context length frames are removed from the input which leads to incorrect training and higher WER

Great!

yaozengwei avatar Oct 28 '22 14:10 yaozengwei

@bethant9 Hi, do you have plan to open PR to fix it ?

kobenaxie avatar Oct 31 '22 02:10 kobenaxie

No, happy for you to do that if you want?

bethant9 avatar Oct 31 '22 09:10 bethant9

ok, I will take a try, thanks ~

kobenaxie avatar Oct 31 '22 09:10 kobenaxie