speech_recognition
speech_recognition copied to clipboard
Vosk api: allow selecting different models and automatic model download
Hello, I added/changed two parameters in the recognize_vosk function, which i believe to be useful.
Firstly, I added a model parameter to allow to select a model based on a model directory. This was previously hard coded to one directory named 'model', making it impossible to easily switch models respectively languages.
Secondly, I noticed that Vosk-api is actually able to download models by itself, based on a given language code. So I implemented this as another parameter 'language'. Previously there was a default language parameter provided in the function, but it was never used.
I implemented it so that the model parameter has precedence over the language parameter, but it defaults to an empty string (False). So that by default the language model is downloaded automatically, because i believe this to be more convenient for the user. However I'm aware that this breaks the previous behavior and may break a user implementation if they want to use a very specific model that they already downloaded (for example one of the larger models). If you think this is a problem i could try to change that.
Also see the updated README in the commit.
Please let me know what you think Thanks
ps.: if this gets merged i would also update the documentation and maybe write some tests for vosk, which i think are still missing.
pss: also it may be good to actually change to return value of the vosk function to make it more in line with the other functions, since i think this is currently returning a json string, instead of a simple string like the others. (Edit: mentioned here aswell: https://github.com/Uberi/speech_recognition/pull/592 )
Edit: also now i found this similar pull request https://github.com/Uberi/speech_recognition/pull/607 adding a model path
Thank you! I'll use it well
finalRecognition = rec.FinalResult() finalRecognition = json.loads(finalRecognition)
return finalRecognition.get('text')
Hi everybody! Just out of curiosity, there is any particular reason for not to merge this PR? It looks like a nice-to-feature for which @rebootl proposed a valid implementation.
I would be happy to work on it if needed!