vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Language identification

Open traderboy opened this issue 3 years ago • 7 comments

I have lots of audio files in different languages and I'd like to run them through Vosk to find out which ones contain Russian speakers. I think I can get close by using the Russian model and word level confidences. But running an English audio file with the same Russian model also returns a lot of results. The confidences are lower than using the Russian model, but not enough to be certain.

How can I find the number words in an audio file that are NOT detected? For example, I have an English audio file that returns 60 words when using an English model, but returns 30 words running the same file through the Russian model. It might be useful to know how many words aren't found or have a zero word confidence level. Is that possible? I haven't found anything in the code or examples that do that.

More generally, what's the best way to reasonable determine programmatically that the language is Russian? I'd like to do the same for other languages such as Chinese.

traderboy avatar Feb 16 '21 16:02 traderboy

We do not support language identification yet.

nshmyrev avatar Feb 16 '21 16:02 nshmyrev

You can use something external like

https://github.com/py-lidbox/lidbox

or

http://bark.phon.ioc.ee/voxlingua107/

nshmyrev avatar Feb 16 '21 16:02 nshmyrev

You can use something external like

https://github.com/py-lidbox/lidbox

or

http://bark.phon.ioc.ee/voxlingua107/

Thanks, the Voxlingua demo is exactly what I need, unfortunately they don't provide source code and instructions. I'm trying out lidbox, but it's not clear how to create an application to do what I need.

You wrote "We do not support language identification yet." so that's encouraging to know that it may be added to Vosk someday. I've been able to use both the C and Python code with good results so it'd be great to continue using Vosk.

traderboy avatar Feb 17 '21 20:02 traderboy

@traderboy https://github.com/snakers4/silero-vad

doublex avatar Jul 09 '21 15:07 doublex

Thanks, the Voxlingua demo is exactly what I need, unfortunately they don't provide source code and instructions. I'm trying out lidbox, but it's not clear how to create an application to do what I need.

Voxlingua code is here:

https://github.com/alumae/torch-xvectors-wav

also

https://github.com/alumae/voxlingua107_sb

nshmyrev avatar Jul 09 '21 15:07 nshmyrev

Related issue #233

nshmyrev avatar Nov 21 '21 23:11 nshmyrev

Also

https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa

and wav2vec based

https://huggingface.co/anton-l/wav2vec2-base-lang-id

nshmyrev avatar Jul 07 '22 12:07 nshmyrev