whisper.cpp
whisper.cpp copied to clipboard
Using hotwords to "bias" transcription (or limit the vocabulary in some way)
Hello there.
I believe that a common usage of Whisper is to fine-tune a smaller model (e.g., base/small) with your data and then use it in a specific context. However, a limitation of Whisper compared to some previous ASR systems (such as Coqui STT with KenLM as a "scorer"), is that there's no way (that I know of) to use a "vocabulary" to limit what can be transcribed. For example, in a medical context, I wouldn't want "la aorta" to sometimes be recognized as "la horta".
It would be great if Whispercpp could have something to help with this issue. In particular, I thought the user could input a list of words of a specific context (in a medical context, for example, organs or diseases). Then, during the transcription, the inference could be "biased" towards the words in that list.
Check out the support for GBNF grammars, and the grammar_penalty param. That might get you on your way.
I experimented with grammars some months ago; iirc transcription speed ended up being a huge problem since I have many, many words to limit the vocabulary. But I'll try to revisit grammars just in case. I also came across these: #235 (I pretty much have the same problem), #271 (I might also try this), and https://github.com/ggerganov/whisper.cpp/discussions/190#discussioncomment-8504735.