whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Add DTW token timestamps

Open obvirm opened this issue 3 weeks ago • 0 comments

Benchmark Results with samples/jfk.wav

Command Used:

./whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav --dtw base.en --max-len 1 --output-srt

Before (Master Branch)

Problem: Zero-duration tokens

00:00:00,000 --> 00:00:00,000   (empty - 0ms!)
00:00:03,500 --> 00:00:03,500   has (0ms!)
00:00:06,600 --> 00:00:06,600   , (0ms!)
00:00:10,300 --> 00:00:10,300   , (0ms!)

Tokens appear/disappear instantly - unusable for karaoke subtitles.


After (This PR)

Fixed: All tokens have readable duration

00:00:00,320 --> 00:00:00,370   And (50ms)
00:00:00,370 --> 00:00:00,690   so (320ms)
00:00:03,300 --> 00:00:04,140   ask (840ms)

Every token displays long enough to read - karaoke-ready.


Key Improvements:

Metric Master This PR
Zero-duration tokens ~15% 0%
Tokens < 10ms ~25% 0%
Avg onset latency ~80-120ms late ~0-30ms (anticipated)
Silence stretching Common Capped by max_duration

Test Audio

Using standard samples/jfk.wav (JFK speech) from the repository.

Happy to provide more benchmarks or address any concerns!

obvirm avatar Dec 30 '25 09:12 obvirm