speech_recognition icon indicating copy to clipboard operation
speech_recognition copied to clipboard

Vosk api: allow selecting different models and automatic model download

Open rebootl opened this issue 2 years ago • 2 comments

Hello, I added/changed two parameters in the recognize_vosk function, which i believe to be useful.

Firstly, I added a model parameter to allow to select a model based on a model directory. This was previously hard coded to one directory named 'model', making it impossible to easily switch models respectively languages.

Secondly, I noticed that Vosk-api is actually able to download models by itself, based on a given language code. So I implemented this as another parameter 'language'. Previously there was a default language parameter provided in the function, but it was never used.

I implemented it so that the model parameter has precedence over the language parameter, but it defaults to an empty string (False). So that by default the language model is downloaded automatically, because i believe this to be more convenient for the user. However I'm aware that this breaks the previous behavior and may break a user implementation if they want to use a very specific model that they already downloaded (for example one of the larger models). If you think this is a problem i could try to change that.

Also see the updated README in the commit.

Please let me know what you think Thanks

ps.: if this gets merged i would also update the documentation and maybe write some tests for vosk, which i think are still missing.

pss: also it may be good to actually change to return value of the vosk function to make it more in line with the other functions, since i think this is currently returning a json string, instead of a simple string like the others. (Edit: mentioned here aswell: https://github.com/Uberi/speech_recognition/pull/592 )

Edit: also now i found this similar pull request https://github.com/Uberi/speech_recognition/pull/607 adding a model path

rebootl avatar Jan 31 '23 14:01 rebootl

Thank you! I'll use it well

finalRecognition = rec.FinalResult() finalRecognition = json.loads(finalRecognition)

return finalRecognition.get('text')

Bae-ChangHyun avatar Nov 03 '23 10:11 Bae-ChangHyun

Hi everybody! Just out of curiosity, there is any particular reason for not to merge this PR? It looks like a nice-to-feature for which @rebootl proposed a valid implementation.

I would be happy to work on it if needed!

Luca-Pozzi avatar Apr 30 '24 13:04 Luca-Pozzi