sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

Restricted vocabulary for transducer models

Open rohithkodali opened this issue 9 months ago • 4 comments

For CTC-based models, we can use HLG.fst to restrict the ASR output to a predefined vocabulary, ensuring that only specific words are recognized. However, streaming CTC models do not generate confidence scores in sherpa-onnx.

On the other hand, streaming Transducer models (e.g., Zipformer) can provide confidence scores, but they lack a built-in mechanism like HLG.fst to restrict vocabulary and force recognition within a predefined context.

Is there any way to constrain the vocabulary in a streaming Transducer model while still leveraging confidence scores? Any methods, modifications, or workarounds (such as token manipulation, LM biasing, or decoder constraints) that could help improve accuracy for domain-specific ASR?

rohithkodali avatar Feb 23 '25 03:02 rohithkodali

Have you had a look at hotwords implementation for contextual biasing? https://k2-fsa.github.io/sherpa/onnx/hotwords/index.html

manickavela29 avatar Feb 23 '25 12:02 manickavela29

Yes i have tried it, they only give high probability for the words but won't restrict in recognizing other words.

rohithkodali avatar Feb 23 '25 13:02 rohithkodali

However, CTC models do not generate confidence scores

Can you explain why CTC models don't generate confidence scores?

csukuangfj avatar Feb 23 '25 15:02 csukuangfj

I mean in the current sherpa-onnx we have confidence scores only for streaming transducer models not for streaming CTC models.

P.S I have corrected my statement in the original message now which is kind of wrong before.

rohithkodali avatar Feb 23 '25 17:02 rohithkodali