speech-recognition-experiments
speech-recognition-experiments copied to clipboard
Do whisper CT2(base model) achieve same speed as that of vosk (english large) with CPU
Do whisper CT2(base model) achieve same speed as that of vosk (english large) on cpu only
Its a bit tricky to answer, because Vosk has a real streaming mode with partial results, meaning you don't have to wait until the user has finished speaking, but only have to transcribe the last chunk of audio left while Whisper basically starts transcribing AFTER the user finished. So the short answer is: the longer you speak the faster Vosk will be.
I haven't compared Whisper to Vosk in non-streaming mode yet. Maybe I'll add some tests for that.
Thank you for creating this comparison . because of this i tried out the faster whisper and It is faster than whisper cpp .
It is indeed, at least on ARM CPUs. You can follow the discussion about it here: https://github.com/ggerganov/whisper.cpp/issues/7#issuecomment-1447752474
It seems to be some optimization issue on ARM. Results on X86 (Intel/AMD) CPUs might show a different result and catch up to the CT2 version.
Hi @nyadla-sys , I wrote you on Twitter via SEPIA account 🙂