sherpa Performance gap between icefall local streaming decoding and sherpa streaming decoding

Performance gap between icefall local streaming decoding and sherpa streaming decoding

Open shaynemei opened this issue 1 year ago • 12 comments

Using the same model (a streaming pruned_transducer_stateless5 trained on gigaspeech), we are experiencing some performance gap between local icefall streaming decoding and sherpa server streaming decoding. WERs for both setup are calculated using the same function here: https://github.com/k2-fsa/icefall/blob/5149788cb2e0730d1537b9711dcfc5c4b11a0f4b/egs/librispeech/ASR/pruned_transducer_stateless5/decode.py#L597-L638

tedlium_dev: local batch decoding: 4.35 local streaming decoding: 4.72 sherpa server streaming decoding: 5.72

Aug 11 '22 17:08 shaynemei

Could you compare the decoded results among them? You can use vimdiff to compare the file recogs-xxx.txt.

Are there many <UNK>s in sherpa based decoding for TEDLIUM?

Aug 11 '22 23:08 csukuangfj

@shaynemei Did you use decode-right-context=2 (the default value) in sherpa. If so, please try decode-right-context=0. We found that not all models can benefit from right context.

Aug 11 '22 23:08 pkufool

Also, can you show your decoding command for local batch decoding and local streaming decoding, I think the WER difference between them is a little large. Thanks!

Aug 11 '22 23:08 pkufool

local batch decoding command:

./pruned_transducer_stateless5/decode.py \
  --epoch 4 \
  --avg 1 \
  --simulate-streaming False \
  --causal-convolution True \
  --use-averaged-model False

local streaming decoding command:

./pruned_transducer_stateless5/decode.py \
  --epoch 4 \
  --avg 1 \
  --simulate-streaming True \
  --causal-convolution True \
  --use-averaged-model False

Aug 11 '22 23:08 shaynemei

Actually there isn't any <UNK>s in sherpa based decoding for TEDLIUM

Aug 11 '22 23:08 shaynemei

the utts in the two recogs.txt aren't in the same order, so I couldn't use vimdiff

Aug 11 '22 23:08 shaynemei

@shaynemei Did you use decode-right-context=2 (the default value) in sherpa. If so, please try decode-right-context=0. We found that not all models can benefit from right context.

@pkufool

I reran TEDLIUM_DEV with no right context and got WER: 5.00

Is this 0.28 gap with local streaming (WER 4.72) expected for sherpa?

Aug 15 '22 22:08 shaynemei

@csukuangfj @danpovey @pkufool just following up on this issue. Is there anything else I should provide?

Aug 23 '22 17:08 shaynemei

Sorry, I have not looked into it yet. I need to reproduce it locally first.

Aug 24 '22 01:08 csukuangfj

Do you need any help / additional information for you reproduce it?

Sep 12 '22 09:09 shaynemei

Sorry for the late reply. Will look into it during the holiday.

Oct 01 '22 06:10 csukuangfj

@csukuangfj Do we have any update on this issue? I am seeing a lot of deletion errors with sherpa decoding of streaming zipformer model.

-Sagar

Apr 25 '23 09:04 uni-sagar-raikar

sherpa sherpa copied to clipboard

Performance gap between icefall local streaming decoding and sherpa streaming decoding

sherpa
sherpa copied to clipboard