vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

False positives when using dynamic graph models

Open carlos-bailon opened this issue 1 year ago • 3 comments

Hi!

I'm using vosk-model-small-en-us-0.15 model and providing grammar in runtime (e.g. ["nose", "teeth", "hair", "[unk]"]).

The issue I'm experiencing is a lot of false positives. When using the model without grammar "restrictions", although confusing some words with very similar pronunciation (e.g. two -> to), it does have a good accuracy. However, once I set the grammar, I experience several false positives (e.g. saying "cheek" returns "teeth").

As far as I have read, the only proposed workaround is to filter the recognized words by the confidence, but many times it returns a false positive with conf=1. I have also tried to modify the acoustic-scale and lattice-beam parameters of the recognizer, but it still fails a lot (although increasing the lattice beam and decreasing a little bit the acoustic scale makes a small improvement).

Is there anyone that has also experienced this and found a valid solution? Thanks in advance!!!

carlos-bailon avatar Apr 20 '23 11:04 carlos-bailon

What application are you building, what is the goal? You can just add cheek to the list.

nshmyrev avatar Apr 20 '23 12:04 nshmyrev

Hi, thank you for your response. The set of words I provided is just an example given to understand the issue. In that specific case, I don't include the word "cheek" in the grammar on purpose, because I don't want to recognize it (if I say "cheek" I want the system to return "[unk]" instead of "teeth").

The issue is that it seems to go for the closer option (teeth) instead of returning "[unk]" (which is what I want since the word is out of the provided grammar). And as I say, is just an example, it happens with several other words (when I say one that is out of the grammar, sometimes it returns "[unk]", which is correct, but many other it returns a false positive with other word of the grammar).

carlos-bailon avatar Apr 21 '23 08:04 carlos-bailon

Adding similar sounding words to the grammar list helps. For instance, you could add "cheek" and the next time someone says "cheek", there is a high chance that "cheek" gets predicted instead of "teeth".

However, this is not the right way to prevent false positives and I am open as well to hear any other ideas :)

kaveenkumar avatar Feb 15 '24 10:02 kaveenkumar