vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Acoustic model training efficiency on noisy data

Open itsmeju58 opened this issue 1 year ago • 1 comments

Hello! I have a question. There are several words (for example, 4 words and different forms of these words) that I want to catch in speech. Translation of other words is also needed, but its accuracy is not so significant. In normal audio, the translation of these words is good. But in audio with a lot of noise, fast speech, or a specific voice, these words quite often don't have a translation at all.

Would training an acoustic model on noisy data containing such words help in this case? Can the translation of these words be significantly improved? How big does the training audio set need to be for this?

itsmeju58 avatar May 23 '23 10:05 itsmeju58

You can provide audio samples to get help on this issue

nshmyrev avatar May 23 '23 21:05 nshmyrev