vosk-api Any tips for better speech recognition in linux?

i have a few problems i am looking to remedy. mainly my problem with speech accuracy when other voices are being played. in a quiet environment vosk performs wonderfully, but when there is noise or someone else talking it is absolutely unusable for my purpose, as i am using it for realtime STTS in voice chats with friends.

Dec 02 '24 01:12 40476

What is your language/accent? You can probably try something modern like Whisper. It depends on many details - vocabulary, etc. It is better to separate channels to avoid speech overlap. If noise source is in your room, there are ways to isolate that. And so on.

Dec 02 '24 01:12 nshmyrev

i would say about midwestern,I am using sprec which uses vosk and outputs it to terminal, here is my script.

#!/bin/bash
notify-send "please wait"
arecord -q --device front:CARD=U0x46d0x825,DEV=0 -fS16_LE -c1 -r16000 | sprec | grep -oP "final 1: \K.*" | tee >(espeak-ng -d TTS_voice) >(espeak-ng) >(systemd-cat -t eon-speak) &
sleep 3
notify-send "start speaking!"

i am using one of the gigaspeech models on my system since proccessing overhead is not an issue to me.

Dec 02 '24 01:12 40476

If you installed small model like sprec readme suggest, you can also try bigger model. Also you can try whisper, it is much more accurate than Vosk for English. Vosk has very specific usecases these days.

Dec 02 '24 02:12 nshmyrev