vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Question about using multiple recognizers with same model and different grammar

Open juntaocheng opened this issue 1 year ago • 2 comments

Hi Team, I am using Vosk api for continous speech recognition in an android application. The offline ASR works very well and by supplying grammar word list the recognition has very high accuracy. Currently, I want to let the app accept more complicated voice command from user. For example, when user first speak "open file", then I will ask him to provide the file name. In the second phase, I need to recognize the file name from his voice input. However, since all possible filenames is dynamically loaded from db, I will need to start a new recognizer (with the file names as grammar) and speechService. So my question is whether dynamically construct multiple short-term used recognizer and speechservice be expensive on resource. Or will that cause problem if I have several recognizer/services created in memory and switch between them based on application context? Thank you for any suggestion. Regards, Steven

juntaocheng avatar Aug 03 '22 09:08 juntaocheng

Hello. This is not an easy question. You can not dynamically add filenames yet, words must be in the vocabulary.

Also we do not recommend switch between grammars too frequently. You can just compile a big grammar with all the commands you need and use them. It will not be much less accurate than separate grammars but it will certainly be more robust to other inputs.

nshmyrev avatar Aug 03 '22 09:08 nshmyrev

Hi Nickolay, Yes, that does be a problem if some of the terms are not within the standard english vocabulary. I saw the introduction about various level of customization, I think it will be the quite involved to introduce new words into model. I will try the combined big grammar approach first. Thanks for your advice.

juntaocheng avatar Aug 03 '22 09:08 juntaocheng