sherpa-onnx
sherpa-onnx copied to clipboard
Restricted vocabulary for transducer models
For CTC-based models, we can use HLG.fst to restrict the ASR output to a predefined vocabulary, ensuring that only specific words are recognized. However, streaming CTC models do not generate confidence scores in sherpa-onnx.
On the other hand, streaming Transducer models (e.g., Zipformer) can provide confidence scores, but they lack a built-in mechanism like HLG.fst to restrict vocabulary and force recognition within a predefined context.
Is there any way to constrain the vocabulary in a streaming Transducer model while still leveraging confidence scores? Any methods, modifications, or workarounds (such as token manipulation, LM biasing, or decoder constraints) that could help improve accuracy for domain-specific ASR?
Have you had a look at hotwords implementation for contextual biasing? https://k2-fsa.github.io/sherpa/onnx/hotwords/index.html
Yes i have tried it, they only give high probability for the words but won't restrict in recognizing other words.
However, CTC models do not generate confidence scores
Can you explain why CTC models don't generate confidence scores?
I mean in the current sherpa-onnx we have confidence scores only for streaming transducer models not for streaming CTC models.
P.S I have corrected my statement in the original message now which is kind of wrong before.