whisper.cpp
whisper.cpp copied to clipboard
Add ability to limit auto-detection to a subset of languages
From what I can tell, auto-detection simply picks the language with the highest probability score. Sometimes, I know the language could only be 1 out of 5 possible languages. In such cases, it would be useful to be able to specify the possible languages, to improve the likelyhood of the auto-detection picking the correct language.
You can use the whisper_pcm_to_mel()
+ whisper_lang_auto_detect()
API.
You will get the probs for all languages in the lang_probs
array:
https://github.com/ggerganov/whisper.cpp/blob/59a3d0cb576db605f76f82f07350647837e15c7a/whisper.h#L244-L255
Thanks for the hint. That definitely works. It would still be nice to have a single param for this though.
whisper.cpp_limit_language_autodetection_patch.diff.gz
Here's a little patch you can try. This will extend the "auto" parameter in the main example so that you can give it a list of allowed languages. So instead of -l auto you would use something like -l auto:pt,es,sv,en
Please note that although this seems to be working, I won't be making a PR out of it. But feel free to use the code as you wish.