Add DTW token timestamps

Open obvirm opened this issue 3 weeks ago • 0 comments

Benchmark Results with `samples/jfk.wav`

Command Used:

./whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav --dtw base.en --max-len 1 --output-srt

Before (Master Branch)

Problem: Zero-duration tokens

00:00:00,000 --> 00:00:00,000   (empty - 0ms!)
00:00:03,500 --> 00:00:03,500   has (0ms!)
00:00:06,600 --> 00:00:06,600   , (0ms!)
00:00:10,300 --> 00:00:10,300   , (0ms!)

Tokens appear/disappear instantly - unusable for karaoke subtitles.

After (This PR)

Fixed: All tokens have readable duration

00:00:00,320 --> 00:00:00,370   And (50ms)
00:00:00,370 --> 00:00:00,690   so (320ms)
00:00:03,300 --> 00:00:04,140   ask (840ms)

Every token displays long enough to read - karaoke-ready.

Key Improvements:

Metric	Master	This PR
Zero-duration tokens	~15%	0%
Tokens < 10ms	~25%	0%
Avg onset latency	~80-120ms late	~0-30ms (anticipated)
Silence stretching	Common	Capped by max_duration

Test Audio

Using standard samples/jfk.wav (JFK speech) from the repository.

Happy to provide more benchmarks or address any concerns!

Dec 30 '25 09:12 obvirm

Add DTW token timestamps

Benchmark Results with samples/jfk.wav

Command Used:

Before (Master Branch)

After (This PR)

Key Improvements:

Test Audio

Benchmark Results with `samples/jfk.wav`