whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Either -dtw doesn't work as intended or I'm missing something

Open magnacartatron opened this issue 1 month ago • 0 comments

I'm testing on large.v2 Here's the command ./main -m models/ggml-large-v2.bin -f samples/jfk.wav -dtw large.v2 -ojf -pp -ls

Here's the JSON output, I've removed timestamps for clarity as they match offsets.

{
	"text": " And",
	"offsets": {
		"from": 320,
		"to": 370
	},
	"id": 400,
	"p": 0.644984,
	"t_dtw": 56
},
{
	"text": " so",
	"offsets": {
		"from": 370,
		"to": 530
	},
	"id": 370,
	"p": 0.904659,
	"t_dtw": 90
},
{
	"text": ",",
	"offsets": {
		"from": 690,
		"to": 860
	},
	"id": 11,
	"p": 0.370488,
	"t_dtw": 108
},
{
	"text": " my",
	"offsets": {
		"from": 860,
		"to": 1110
	},
	"id": 452,
	"p": 0.900208,
	"t_dtw": 124
},
{
	"text": " fellow",
	"offsets": {
		"from": 1110,
		"to": 1850
	},
	"id": 7177,
	"p": 0.814694,
	"t_dtw": 158
},

How is one meant to interpret the t_dtw field. If I don't run it with the -dtw option then it's -1 If I do then I'm seeing these numbers. I've tried every possible combination to figure out how the t_dtw can be used but there's no pattern. Am I missing something here. Even if it's 100ths of a second I'm looking at it still doesn't match up with audio and offsets are more correct.

magnacartatron avatar May 14 '24 06:05 magnacartatron