vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Value of silence_weight

Open RuABraun opened this issue 2 years ago • 7 comments

Hi Nickolay! I've started using the vosk-api, and I've had a few utterances where the output was worse compared to when doing decoding with kaldi's tcp binary (stuff like no instead of yes being recognized). I managed to track down what was causing the difference for most (unfortunately not all) to the silence_weight, which is hardcoded to 1e-3 in vosk.

As I understand training is done without this silence weight being used (source), so wouldn't it make more sense to set that to 1.? What do you think?

RuABraun avatar May 12 '22 10:05 RuABraun

With 1.0 it will not downweight silence frames in ivector computation. As a result, the accuracy will degrade for test with longer silence (1-2 seconds) in the beginning. It might be worse for short examples without silence indeed, so not ideal thing.

Ideally we move to something like conformer encoder which properly detects context. Its a long-term plan though.

nshmyrev avatar May 12 '22 11:05 nshmyrev

Are you sure using 1.0 will cause results to degrade? The model has never seen input without 1.0 so it seems to me that changing it could cause problems. The examples I have actually have ~1 second of silence at the start and that is what is being detected as "no".

RuABraun avatar May 12 '22 12:05 RuABraun

The model has never seen input without 1.0

Model usually don't see long start silence during training so silence-weight value doesn't matter much.

The examples I have actually have ~1 second of silence at the start and that is what is being detected as "no".

It has to be not just a pure silence (ivectors extractor drops such frames) but something like quiet white noise for essential period of time (maybe 3-5 seconds), so ivector estimation breaks. There is also max-count which plays a role here. Honestly we didn't test it deep, it is mostly ad-hoc setting.

nshmyrev avatar May 12 '22 13:05 nshmyrev

Hm okay, I will close this issue then.

RuABraun avatar May 12 '22 15:05 RuABraun

I can run experiments a bit later and let you know

nshmyrev avatar May 12 '22 17:05 nshmyrev

I did some more experiments and can confirm your statement that silence_weight 1.0 is worse. Honestly the experience just makes me want to throw out ivectors.

RuABraun avatar May 30 '22 09:05 RuABraun

Honestly the experience just makes me want to throw out ivectors.

Yes, it is the right direction, they are inherently unstable.

nshmyrev avatar May 30 '22 09:05 nshmyrev