sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

Resetting stream without resetting encoder states resulting in deletion errors during streaming decoding

Open srinivasakm opened this issue 10 months ago • 2 comments

https://github.com/k2-fsa/sherpa-onnx/blob/ecc653871d305c79002d2630c7cf0d0e1d6bf1ed/sherpa-onnx/csrc/online-recognizer-transducer-impl.h#L382C1-L383C53 @csukuangfj We have observed an issue since version 1.10.0 of sherpa-onnx. Specifically, the encoder state reset functionality was commented out in the following file: Path: sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h

// reset encoder states
// s->SetStates(model_->GetEncoderInitStates());

(resetting encoder states was included as part of 1.9.26 issue924 and commented as part of 1.10.0 (after 1.9.29))

In our experiments using models trained with the Zipformer-2 encoder and stateless transducer across multiple Indic languages, we noticed the following behavior:

  1. When we reset the stream object during streaming decoding (link), the commented-out encoder states reset code (as in link) results in deletion errors in transcriptions after the end-point detection
  2. This issue occurs while decoding real-time conversational audio with different silence gaps in between.

However, when we uncomment the encoder states reset code, the issue is resolved/reduced, and transcription accuracy improves.

Questions:

  1. Is there a specific reason for commenting out the encoder reset functionality?
  2. Was this change driven by certain experimental results?
  3. Are there any recommended workarounds to handle this scenario effectively?

srinivasakm avatar Jan 12 '25 16:01 srinivasakm