vosk-api
vosk-api copied to clipboard
Realtime STT with large model
I wanted to test realtime speach recognition with large model (vosk-model-en-us-0.22) for extra accuracy. When i use result from call to PartialResult after AcceptWaveform delay is mostley acceptable, but call to Result or FinalResult can take quite a long time. I do understand that large model is perhaps not designed for realtime processing, but result from PartialResult looks usable. Does call to Result or FinalResult provide extra improvement over PartialResult? Or would it be possible to use only partial result and perhaps skip that final processing.
It is recommended to use final result. Final result should be more or less fast, what is your hardware it is slow for you?
Im testing on Ryzen 5 3600 and calls to Result can take more than 4 seconds and it happens quite often. That is for large model (vosk-model-en-us-0.22), for smaller models (vosk-model-en-us-0.22-lgraph and vosk-model-small-en-us-0.15) delay is small.
How much memory? Try to remove rnnlm folder from the model, it should react faster.
Memory is 16gb. I did remove rnnlm folder and that delay is gone. And i dont have to wait like 30 seconds for model to load.
Ok, probably half of that is busy with other tasks and the remaining is not enough.
I dont think memory is the issue, its used around 50%.
Ok, does it work without RNNLM? Also, can you please update to 0.3.38, it had some performance fixes.
It does work without RNNLM. I tested with update 0.3.38, its still the same as before - call to Result() can take about 5 seconds (with RNNLM).
Ok, thank you for the testing. Lets count RNNLM is too slow for your hardware.