vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Confidence value in nbest list is not normalized to 1.0?

Open fquirin opened this issue 3 years ago • 8 comments

I was wondering how to interpret the "confidence" value? Usual values I get are between ~50 to >500 and it looks like >300 is ok.

Here is my setup:

  • Vosk v0.3.30
  • Small EN and DE models (v0.15)
  • Streaming audio chunks in 16Khz mono
  • Test files

You can reproduce my results using the BETA version of the SEPIA STT Server (there are Docker containers for all platforms):

12:30:17 - {"type":"result","msg_id":7,"code":200,"transcript":" one two three four five","isFinal":true,"confidence":529.317749,"features":{}}
...
12:30:20 - {"type":"result","msg_id":14,"code":200,"transcript":" six seven eight nine ten","isFinal":true,"confidence":557.565491,"features":{}}

The 'confidence' you see in my results is taken directly from 'FinalResult'.

fquirin avatar Jun 24 '21 10:06 fquirin

Just in case anyone wants to reproduce the exact same results with my server here is the options object for the 'welcome' event:

image

The audio file was: test-audio/easy_counting_en2.ogg

fquirin avatar Jun 24 '21 10:06 fquirin

I asked myself the same question (using my node Vosk wrapper: https://github.com/solyarisoftware/voskJs). I believe that's a Vosk v0.3.30 change:

$ voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-small-en-us-0.15 --alternatives=3

model directory      : models/vosk-model-small-en-us-0.15
speech file name     : audio/2830-3980-0043.wav
grammar              : not specified. Default: NO
sample rate          : not specified. Default: 16000
max alternatives     : 3
text only / JSON     : JSON
Vosk debug level     : -1

load model latency   : 362ms

{
  alternatives: [
    {
      confidence: 175.552368,
      result: [
        { end: 1.02, start: 0.36, word: 'experience' },
        { end: 1.35, start: 1.02, word: 'proves' },
        { end: 1.98, start: 1.35, word: 'this' }
      ],
      text: ' experience proves this'
    }
  ]
}

transcript latency : 587ms

Instead, in previous Vosk release (e.g. v0.2.27), result object items included the confidence ( <=1 ) for each word: https://github.com/solyarisoftware/voskJs/tree/master/examples#simple-program-for-a-sentence-based-speech-to-text, whereas here the confidence is a unique result, for each of "alternative" result.

The change is not clear to me too. It seems that now confidence is a value for each result (sentence) instead of for each word. I still do not understand why the confidence value is > 1

solyarisoftware avatar Jun 24 '21 14:06 solyarisoftware

I'm realizing there is a small related change ( a minor format bug maybe).

if I do NOT do call setAlternatives() I got the old format:

{
  result: [
    { conf: 1, end: 1.02, start: 0.36, word: 'experience' },
    { conf: 1, end: 1.35, start: 1.02, word: 'proves' },
    { conf: 1, end: 1.74, start: 1.35, word: 'this' }
  ],
  text: 'experience proves this'
}

so the confidence is set to 1 for each word (it makes sense, maybe useless)

Instead, If I specify setAlternatives(), I got the new format:

{
  alternatives: [
    {
      confidence: 197.583099,
      result: [
        { end: 1.02, start: 0.36, word: 'experience' },
        { end: 1.35, start: 1.02, word: 'proves' },
        { end: 1.98, start: 1.35, word: 'this' }
      ],
      text: ' experience proves this'
    }
  ]
}

Minor points. Just a remind.

solyarisoftware avatar Jun 24 '21 15:06 solyarisoftware

result object items included the confidence ( <=1 ) for each word

Yes, you are right. There is another funny thing: If you set alternatives to 0 and words to true you get "confidence: 1" for each word, if you set alternatives to 1 (which is essentially the same as 0) the "confidence" field for each word doesn't show up at all ;-)

fquirin avatar Jun 24 '21 17:06 fquirin

Alternatives confidence is not fully functional yet, we will change it in coming versions.

Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.

nshmyrev avatar Jun 24 '21 21:06 nshmyrev

I see. What about the general confidence values ~500 etc. (alternatives 1, words false)? ~Can we simply scale this by some factor or does it depend on dynamic properties like length of the input ... ?~ Should we ignore this for now?

fquirin avatar Jun 25 '21 07:06 fquirin

Hi @nshmyrev

Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.

Ok, considering my previous example, I guess that in this line

 { conf: 1, end: 1.02, start: 0.36, word: 'experience' },

the attribute conf is the Minimum Bayes Risk (MBR) confidence.

But I'm still perplexed; in almost (but not all) my tests I get the value 1, if the entire sentence is successfully recognized. But in some cases I got values different form 1. As for this audio: https://github.com/solyarisoftware/voskJs/blob/master/audio/8455-210777-0068.wav where I get conf: 0.85313 for the first word four:

$ voskjs --audio=audio/8455-210777-0068.wav --model=models/vosk-model-small-en-us-0.15

model directory      : models/vosk-model-small-en-us-0.15
speech file name     : audio/8455-210777-0068.wav
grammar              : not specified. Default: NO
sample rate          : not specified. Default: 16000
max alternatives     : undefined
text only / JSON     : JSON
Vosk debug level     : -1
load model latency   : 313ms
transcript text      : your power is sufficient i said
transcript latency   : 754ms

  TIME EVENT       VOSK RESULT OBJECT
------ ----------- ------------------
    70 partial     { partial: '' }
    74 partial     { partial: '' }
   456 partial     { partial: '' }
   548 partial     { partial: 'your' }
   620 partial     { partial: 'your power is' }
   718 partial     { partial: 'your power is sufficient i said' }
   739 partial     { partial: 'your power is sufficient i said' }
   754 final       { result: [ { conf: 0.85313, end: 0.75, start: 0.54, word: 'your' }, { conf: 1, end: 1.08, start: 0.75, word: 'power' }, { conf: 1, end: 1.23, start: 1.08, word: 'is' }, { conf: 1, end: 1.74, start: 1.23, word: 'sufficient' }, { conf: 1, end: 1.83, start: 1.74, word: 'i' }, { conf: 1, end: 2.16, start: 1.83, word: 'said' } ], text: 'your power is sufficient i said' }

That's not fully clear to me. A documentation on what's conf attribute would very welcome. Thanks

solyarisoftware avatar Jun 25 '21 08:06 solyarisoftware

Where can I find this function call: setAlternatives()

ester-levi avatar Jul 10 '22 09:07 ester-levi