sherpa-onnx
sherpa-onnx copied to clipboard
Resetting stream without resetting encoder states resulting in deletion errors during streaming decoding
https://github.com/k2-fsa/sherpa-onnx/blob/ecc653871d305c79002d2630c7cf0d0e1d6bf1ed/sherpa-onnx/csrc/online-recognizer-transducer-impl.h#L382C1-L383C53
@csukuangfj
We have observed an issue since version 1.10.0 of sherpa-onnx. Specifically, the encoder state reset functionality was commented out in the following file:
Path: sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h
// reset encoder states
// s->SetStates(model_->GetEncoderInitStates());
(resetting encoder states was included as part of 1.9.26 issue924 and commented as part of 1.10.0 (after 1.9.29))
In our experiments using models trained with the Zipformer-2 encoder and stateless transducer across multiple Indic languages, we noticed the following behavior:
- When we reset the
streamobject during streaming decoding (link), the commented-out encoder states reset code (as in link) results in deletion errors in transcriptions after the end-point detection - This issue occurs while decoding real-time conversational audio with different silence gaps in between.
However, when we uncomment the encoder states reset code, the issue is resolved/reduced, and transcription accuracy improves.
Questions:
- Is there a specific reason for commenting out the encoder reset functionality?
- Was this change driven by certain experimental results?
- Are there any recommended workarounds to handle this scenario effectively?