vosk-api Language identification

Language identification

Open traderboy opened this issue 3 years ago • 7 comments

I have lots of audio files in different languages and I'd like to run them through Vosk to find out which ones contain Russian speakers. I think I can get close by using the Russian model and word level confidences. But running an English audio file with the same Russian model also returns a lot of results. The confidences are lower than using the Russian model, but not enough to be certain.

How can I find the number words in an audio file that are NOT detected? For example, I have an English audio file that returns 60 words when using an English model, but returns 30 words running the same file through the Russian model. It might be useful to know how many words aren't found or have a zero word confidence level. Is that possible? I haven't found anything in the code or examples that do that.

More generally, what's the best way to reasonable determine programmatically that the language is Russian? I'd like to do the same for other languages such as Chinese.

Feb 16 '21 16:02 traderboy

We do not support language identification yet.

Feb 16 '21 16:02 nshmyrev

You can use something external like

https://github.com/py-lidbox/lidbox

http://bark.phon.ioc.ee/voxlingua107/

Feb 16 '21 16:02 nshmyrev

You can use something external like

https://github.com/py-lidbox/lidbox

or

http://bark.phon.ioc.ee/voxlingua107/

Thanks, the Voxlingua demo is exactly what I need, unfortunately they don't provide source code and instructions. I'm trying out lidbox, but it's not clear how to create an application to do what I need.

You wrote "We do not support language identification yet." so that's encouraging to know that it may be added to Vosk someday. I've been able to use both the C and Python code with good results so it'd be great to continue using Vosk.

Feb 17 '21 20:02 traderboy

@traderboy https://github.com/snakers4/silero-vad

Jul 09 '21 15:07 doublex

Thanks, the Voxlingua demo is exactly what I need, unfortunately they don't provide source code and instructions. I'm trying out lidbox, but it's not clear how to create an application to do what I need.

Voxlingua code is here:

https://github.com/alumae/torch-xvectors-wav

also

https://github.com/alumae/voxlingua107_sb

Jul 09 '21 15:07 nshmyrev

Related issue #233

Nov 21 '21 23:11 nshmyrev

Also

https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa

and wav2vec based

https://huggingface.co/anton-l/wav2vec2-base-lang-id

Jul 07 '22 12:07 nshmyrev

vosk-api vosk-api copied to clipboard

Language identification

vosk-api
vosk-api copied to clipboard