vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Any tips for better speech recognition in linux?

Open 40476 opened this issue 1 year ago • 3 comments

i have a few problems i am looking to remedy. mainly my problem with speech accuracy when other voices are being played. in a quiet environment vosk performs wonderfully, but when there is noise or someone else talking it is absolutely unusable for my purpose, as i am using it for realtime STTS in voice chats with friends.

40476 avatar Dec 02 '24 01:12 40476

What is your language/accent? You can probably try something modern like Whisper. It depends on many details - vocabulary, etc. It is better to separate channels to avoid speech overlap. If noise source is in your room, there are ways to isolate that. And so on.

nshmyrev avatar Dec 02 '24 01:12 nshmyrev

i would say about midwestern,I am using sprec which uses vosk and outputs it to terminal, here is my script.

#!/bin/bash
notify-send "please wait"
arecord -q --device front:CARD=U0x46d0x825,DEV=0 -fS16_LE -c1 -r16000 | sprec | grep -oP "final 1: \K.*" | tee >(espeak-ng -d TTS_voice) >(espeak-ng) >(systemd-cat -t eon-speak) &
sleep 3
notify-send "start speaking!"

i am using one of the gigaspeech models on my system since proccessing overhead is not an issue to me.

40476 avatar Dec 02 '24 01:12 40476

If you installed small model like sprec readme suggest, you can also try bigger model. Also you can try whisper, it is much more accurate than Vosk for English. Vosk has very specific usecases these days.

nshmyrev avatar Dec 02 '24 02:12 nshmyrev