whisper.cpp
whisper.cpp copied to clipboard
Either -dtw doesn't work as intended or I'm missing something
I'm testing on large.v2
Here's the command
./main -m models/ggml-large-v2.bin -f samples/jfk.wav -dtw large.v2 -ojf -pp -ls
Here's the JSON output, I've removed timestamps for clarity as they match offsets.
{
"text": " And",
"offsets": {
"from": 320,
"to": 370
},
"id": 400,
"p": 0.644984,
"t_dtw": 56
},
{
"text": " so",
"offsets": {
"from": 370,
"to": 530
},
"id": 370,
"p": 0.904659,
"t_dtw": 90
},
{
"text": ",",
"offsets": {
"from": 690,
"to": 860
},
"id": 11,
"p": 0.370488,
"t_dtw": 108
},
{
"text": " my",
"offsets": {
"from": 860,
"to": 1110
},
"id": 452,
"p": 0.900208,
"t_dtw": 124
},
{
"text": " fellow",
"offsets": {
"from": 1110,
"to": 1850
},
"id": 7177,
"p": 0.814694,
"t_dtw": 158
},
How is one meant to interpret the t_dtw
field. If I don't run it with the -dtw
option then it's -1
If I do then I'm seeing these numbers. I've tried every possible combination to figure out how the t_dtw can be used but there's no pattern. Am I missing something here. Even if it's 100ths of a second I'm looking at it still doesn't match up with audio and offsets are more correct.