sherpa icon indicating copy to clipboard operation
sherpa copied to clipboard

Performance gap between icefall local streaming decoding and sherpa streaming decoding

Open shaynemei opened this issue 1 year ago • 12 comments

Using the same model (a streaming pruned_transducer_stateless5 trained on gigaspeech), we are experiencing some performance gap between local icefall streaming decoding and sherpa server streaming decoding. WERs for both setup are calculated using the same function here: https://github.com/k2-fsa/icefall/blob/5149788cb2e0730d1537b9711dcfc5c4b11a0f4b/egs/librispeech/ASR/pruned_transducer_stateless5/decode.py#L597-L638

tedlium_dev: local batch decoding: 4.35 local streaming decoding: 4.72 sherpa server streaming decoding: 5.72

shaynemei avatar Aug 11 '22 17:08 shaynemei

Could you compare the decoded results among them? You can use vimdiff to compare the file recogs-xxx.txt.

Are there many <UNK>s in sherpa based decoding for TEDLIUM?

csukuangfj avatar Aug 11 '22 23:08 csukuangfj

@shaynemei Did you use decode-right-context=2 (the default value) in sherpa. If so, please try decode-right-context=0. We found that not all models can benefit from right context.

pkufool avatar Aug 11 '22 23:08 pkufool

Also, can you show your decoding command for local batch decoding and local streaming decoding, I think the WER difference between them is a little large. Thanks!

pkufool avatar Aug 11 '22 23:08 pkufool

local batch decoding command:

./pruned_transducer_stateless5/decode.py \
  --epoch 4 \
  --avg 1 \
  --simulate-streaming False \
  --causal-convolution True \
  --use-averaged-model False

local streaming decoding command:

./pruned_transducer_stateless5/decode.py \
  --epoch 4 \
  --avg 1 \
  --simulate-streaming True \
  --causal-convolution True \
  --use-averaged-model False

shaynemei avatar Aug 11 '22 23:08 shaynemei

Actually there isn't any <UNK>s in sherpa based decoding for TEDLIUM

shaynemei avatar Aug 11 '22 23:08 shaynemei

the utts in the two recogs.txt aren't in the same order, so I couldn't use vimdiff

shaynemei avatar Aug 11 '22 23:08 shaynemei

@shaynemei Did you use decode-right-context=2 (the default value) in sherpa. If so, please try decode-right-context=0. We found that not all models can benefit from right context.

@pkufool

I reran TEDLIUM_DEV with no right context and got WER: 5.00

Is this 0.28 gap with local streaming (WER 4.72) expected for sherpa?

shaynemei avatar Aug 15 '22 22:08 shaynemei

@csukuangfj @danpovey @pkufool just following up on this issue. Is there anything else I should provide?

shaynemei avatar Aug 23 '22 17:08 shaynemei

Sorry, I have not looked into it yet. I need to reproduce it locally first.

csukuangfj avatar Aug 24 '22 01:08 csukuangfj

Do you need any help / additional information for you reproduce it?

shaynemei avatar Sep 12 '22 09:09 shaynemei

Sorry for the late reply. Will look into it during the holiday.

csukuangfj avatar Oct 01 '22 06:10 csukuangfj

@csukuangfj Do we have any update on this issue? I am seeing a lot of deletion errors with sherpa decoding of streaming zipformer model.

-Sagar

uni-sagar-raikar avatar Apr 25 '23 09:04 uni-sagar-raikar