vosk-api
vosk-api copied to clipboard
Confidence value in nbest list is not normalized to 1.0?
I was wondering how to interpret the "confidence" value? Usual values I get are between ~50 to >500 and it looks like >300 is ok.
Here is my setup:
- Vosk v0.3.30
- Small EN and DE models (v0.15)
- Streaming audio chunks in 16Khz mono
- Test files
You can reproduce my results using the BETA version of the SEPIA STT Server (there are Docker containers for all platforms):
12:30:17 - {"type":"result","msg_id":7,"code":200,"transcript":" one two three four five","isFinal":true,"confidence":529.317749,"features":{}}
...
12:30:20 - {"type":"result","msg_id":14,"code":200,"transcript":" six seven eight nine ten","isFinal":true,"confidence":557.565491,"features":{}}
The 'confidence' you see in my results is taken directly from 'FinalResult'.
Just in case anyone wants to reproduce the exact same results with my server here is the options object for the 'welcome' event:
The audio file was: test-audio/easy_counting_en2.ogg
I asked myself the same question (using my node Vosk wrapper: https://github.com/solyarisoftware/voskJs). I believe that's a Vosk v0.3.30 change:
$ voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-small-en-us-0.15 --alternatives=3
model directory : models/vosk-model-small-en-us-0.15
speech file name : audio/2830-3980-0043.wav
grammar : not specified. Default: NO
sample rate : not specified. Default: 16000
max alternatives : 3
text only / JSON : JSON
Vosk debug level : -1
load model latency : 362ms
{
alternatives: [
{
confidence: 175.552368,
result: [
{ end: 1.02, start: 0.36, word: 'experience' },
{ end: 1.35, start: 1.02, word: 'proves' },
{ end: 1.98, start: 1.35, word: 'this' }
],
text: ' experience proves this'
}
]
}
transcript latency : 587ms
Instead, in previous Vosk release (e.g. v0.2.27), result object items included the confidence ( <=1 ) for each word: https://github.com/solyarisoftware/voskJs/tree/master/examples#simple-program-for-a-sentence-based-speech-to-text, whereas here the confidence is a unique result, for each of "alternative" result.
The change is not clear to me too. It seems that now confidence is a value for each result (sentence) instead of for each word. I still do not understand why the confidence value is > 1
I'm realizing there is a small related change ( a minor format bug maybe).
if I do NOT do call setAlternatives()
I got the old format:
{
result: [
{ conf: 1, end: 1.02, start: 0.36, word: 'experience' },
{ conf: 1, end: 1.35, start: 1.02, word: 'proves' },
{ conf: 1, end: 1.74, start: 1.35, word: 'this' }
],
text: 'experience proves this'
}
so the confidence is set to 1 for each word (it makes sense, maybe useless)
Instead, If I specify setAlternatives()
, I got the new format:
{
alternatives: [
{
confidence: 197.583099,
result: [
{ end: 1.02, start: 0.36, word: 'experience' },
{ end: 1.35, start: 1.02, word: 'proves' },
{ end: 1.98, start: 1.35, word: 'this' }
],
text: ' experience proves this'
}
]
}
Minor points. Just a remind.
result object items included the confidence ( <=1 ) for each word
Yes, you are right. There is another funny thing: If you set alternatives to 0 and words to true you get "confidence: 1" for each word, if you set alternatives to 1 (which is essentially the same as 0) the "confidence" field for each word doesn't show up at all ;-)
Alternatives confidence is not fully functional yet, we will change it in coming versions.
Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.
I see. What about the general confidence values ~500 etc. (alternatives 1, words false)? ~Can we simply scale this by some factor or does it depend on dynamic properties like length of the input ... ?~ Should we ignore this for now?
Hi @nshmyrev
Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.
Ok, considering my previous example, I guess that in this line
{ conf: 1, end: 1.02, start: 0.36, word: 'experience' },
the attribute conf
is the Minimum Bayes Risk (MBR) confidence.
But I'm still perplexed; in almost (but not all) my tests I get the value 1
, if the entire sentence is successfully recognized. But in some cases I got values different form 1. As for this audio: https://github.com/solyarisoftware/voskJs/blob/master/audio/8455-210777-0068.wav where I get conf: 0.85313
for the first word four
:
$ voskjs --audio=audio/8455-210777-0068.wav --model=models/vosk-model-small-en-us-0.15
model directory : models/vosk-model-small-en-us-0.15
speech file name : audio/8455-210777-0068.wav
grammar : not specified. Default: NO
sample rate : not specified. Default: 16000
max alternatives : undefined
text only / JSON : JSON
Vosk debug level : -1
load model latency : 313ms
transcript text : your power is sufficient i said
transcript latency : 754ms
TIME EVENT VOSK RESULT OBJECT
------ ----------- ------------------
70 partial { partial: '' }
74 partial { partial: '' }
456 partial { partial: '' }
548 partial { partial: 'your' }
620 partial { partial: 'your power is' }
718 partial { partial: 'your power is sufficient i said' }
739 partial { partial: 'your power is sufficient i said' }
754 final { result: [ { conf: 0.85313, end: 0.75, start: 0.54, word: 'your' }, { conf: 1, end: 1.08, start: 0.75, word: 'power' }, { conf: 1, end: 1.23, start: 1.08, word: 'is' }, { conf: 1, end: 1.74, start: 1.23, word: 'sufficient' }, { conf: 1, end: 1.83, start: 1.74, word: 'i' }, { conf: 1, end: 2.16, start: 1.83, word: 'said' } ], text: 'your power is sufficient i said' }
That's not fully clear to me. A documentation on what's conf attribute would very welcome. Thanks
Where can I find this function call: setAlternatives()