sherpa-onnx [BUG] OnlineRecognizer not outputting certain tokens after endpoint detection

[BUG] OnlineRecognizer not outputting certain tokens after endpoint detection

Open lgarcia-trebe opened this issue 11 months ago • 3 comments

I'm testing speech recognition from a microphone with endpoint detection using the provided Python example, and I've found the following issue: when the first token of a segment is the same as the last token of the previous segment, such token is not included in the segment's result.

Examples:

(1) Input and expected result 0: THIS IS MY CAR 1: CARRYING SOMETHING

(1) Obtained result 0: THIS IS MY CAR 1: RYING SOMETHING

(2) Input and expected result 0: THIS IS MY CAR 1: CARWASH

(2) Obtained result 0: THIS IS MY CAR 1: WASH

Possible cause and workaround

It seems that the issue is tied to not properly resetting the stream on endpoint detection. If instead of calling recognizer.reset(stream) we directly create a new stream (stream = recognizer.create_stream()), the issue is no longer present. I've seen that calling reset doesn't reset the feature extractor, which might be the underlying cause.

I've recently started in this speech recognition world, and I'm not yet aware of all the technical implementation details, so I'm not sure what implications may arise from resetting the feature extractor too or from directly creating a new stream object. Any information or guidelines are appreciated.

Trained models used

Reproduced the issue with both a custom trained model and one of the repo's public models (sherpa-onnx-streaming-zipformer-en-2023-06-21).

Dec 18 '24 12:12 lgarcia-trebe

sherpa-onnx sherpa-onnx copied to clipboard

[BUG] OnlineRecognizer not outputting certain tokens after endpoint detection

Examples:

Possible cause and workaround

Trained models used

sherpa-onnx
sherpa-onnx copied to clipboard