whisper.cpp
whisper.cpp copied to clipboard
Using `--max-len` gives weird time codes
When running whisper.cpp with e.g. --max-len 77 I get some weird time codes.
It does not happen when not using --max-len.
Examples:
[00:34:35.820 --> 00:34:36.820] You built that with your brothers.
[00:-3:-29.-700 --> 00:41:15.820] with and you were gone more than you were home.
[00:41:11.820 --> 00:-3:-29.-700] They resent the hell out of you because you were the parent they were left
[00:41:08.820 --> 00:41:11.820] Most of your children don't speak to their mother.
[01:03:24.820 --> 01:03:26.820] Well, have the Grayson brothers signed it yet?
[00:18:36.250 --> 01:03:24.820] was a solid deal, and you weren't here, so paperwork's already been filed.
[01:03:15.820 --> 00:18:36.250] Listen, Gabrielle convinced the other partners that the Playground merger
The --max-len option uses the experimental approach for word-level timestamps that I implemented and it probably has a bug. We might be able to debug it if you uncomment the following lines and send me the output around the timestamps that get messed up:
https://github.com/ggerganov/whisper.cpp/blob/164df0d447bd16a4f6cd0e08caef5b01a63354af/whisper.cpp#L3343-L3353
It should look something like this:
...
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.000 0.000 3.010 10044 10057 ' can'
whisper_exp_compute_token_level_timestamps: [?] 0.996 0.012 0.000 4.010 10060 10082 ' find'
whisper_exp_compute_token_level_timestamps: [?] 1.000 0.998 0.000 4.010 10082 10102 ' hope'
whisper_exp_compute_token_level_timestamps: [?] 0.994 0.082 0.000 3.010 10110 10124 ' and'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.095 0.000 11.010 10124 10193 ' inspiration'
whisper_exp_compute_token_level_timestamps: [_TT_722] 0.940 0.603 0.057 2.010 10196 10206 ' in'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.409 0.001 3.010 10206 10221 ' our'
whisper_exp_compute_token_level_timestamps: [?] 0.995 0.249 0.002 10.010 10221 10269 ' commitment'
whisper_exp_compute_token_level_timestamps: [_TT_760] 0.829 0.533 0.162 2.010 10276 10294 ' to'
whisper_exp_compute_token_level_timestamps: [?] 0.984 0.042 0.000 7.010 10294 10338 ' liberty'
whisper_exp_compute_token_level_timestamps: [?] 0.998 0.357 0.002 3.000 10373 10407 '.'
whisper_exp_compute_token_level_timestamps: [_TT_828] 0.364 0.364 1.000 15.000 10408 10408 '[_TT_828]'
[00:01:40.400 --> 00:01:44.080] can find hope and inspiration in our commitment to liberty.
whisper_exp_compute_token_level_timestamps: [?] 0.998 0.500 0.000 3.010 10414 10421 ' For'
whisper_exp_compute_token_level_timestamps: [?] 1.000 0.000 0.000 4.010 10421 10438 ' more'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.000 0.000 4.010 10438 10453 ' than'
whisper_exp_compute_token_level_timestamps: [?] 0.992 0.143 0.000 3.010 10454 10468 ' two'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.000 0.000 9.010 10468 10508 ' centuries'
whisper_exp_compute_token_level_timestamps: [?] 0.984 0.267 0.001 2.000 10508 10519 ','
whisper_exp_compute_token_level_timestamps: [_TT_884] 0.581 0.577 0.401 9.010 10519 10571 ' Americans'
whisper_exp_compute_token_level_timestamps: [_TT_910] 0.372 0.652 0.571 15.000 10572 10572 '[_TT_910]'
[00:01:44.080 --> 00:01:45.720] For more than two centuries, Americans
...
Log (using the medium model):
[00:43:44.820 --> 00:43:46.820] I was right behind them.
whisper_exp_compute_token_level_timestamps: [_TT_800] 0.888 0.934 0.951 15.000 262682 262682 '[_TT_800]'
whisper_exp_compute_token_level_timestamps: [?] 0.997 0.671 0.001 3.000 -5774 -5743 '.'
whisper_exp_compute_token_level_timestamps: [?] 0.937 0.007 0.000 4.010 -5807 -5774 ' them'
whisper_exp_compute_token_level_timestamps: [?] 1.000 0.000 0.000 6.010 -5872 -5818 ' behind'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.008 0.000 5.010 -5911 -5872 ' right'
whisper_exp_compute_token_level_timestamps: [?] 1.000 0.500 0.000 3.010 -5940 -5911 ' was'
whisper_exp_compute_token_level_timestamps: [?] 0.756 0.004 0.000 1.010 -5952 -5940 ' I'
[00:-1:-1.-890 --> 00:43:44.820] they could.
[00:43:30.820 --> 00:-1:-1.-890] You know, I didn't let those girls walk home alone. I only let them think
whisper_exp_compute_token_level_timestamps: [_TT_700] 0.447 0.754 0.593 15.000 262482 262482 '[_TT_700]'
whisper_exp_compute_token_level_timestamps: [?] 0.992 0.360 0.001 3.000 -6004 -5957 '.'
whisper_exp_compute_token_level_timestamps: [?] 1.000 0.998 0.000 5.010 -6106 -6004 ' could'
whisper_exp_compute_token_level_timestamps: [?] 0.991 0.038 0.000 4.010 -6189 -6117 ' they'
whisper_exp_compute_token_level_timestamps: [?] 0.997 0.011 0.000 5.010 -6275 -6189 ' think'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.033 0.000 4.010 -6323 -6275 ' them'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.111 0.000 3.010 -6416 -6358 ' let'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.050 0.000 4.010 -6488 -6416 ' only'
whisper_exp_compute_token_level_timestamps: [_TT_600] 0.799 0.900 0.199 1.010 -6510 -6488 ' I'
whisper_exp_compute_token_level_timestamps: [?] 0.979 0.279 0.000 3.000 -6565 -6510 '.'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.022 0.000 5.010 -6659 -6565 ' alone'
whisper_exp_compute_token_level_timestamps: [?] 0.995 0.016 0.000 4.010 -6735 -6659 ' home'
whisper_exp_compute_token_level_timestamps: [?] 0.996 0.005 0.000 4.010 -6809 -6735 ' walk'
whisper_exp_compute_token_level_timestamps: [?] 0.998 0.010 0.000 5.010 -6897 -6809 ' girls'
whisper_exp_compute_token_level_timestamps: [?] 0.994 0.018 0.000 5.010 -6977 -6897 ' those'
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.013 0.000 3.010 -7052 -6977 ' let'
whisper_exp_compute_token_level_timestamps: [?] 1.000 0.998 0.000 2.000 -7091 -7052 ''t'
whisper_exp_compute_token_level_timestamps: [?] 0.998 0.008 0.000 4.010 -7165 -7091 ' didn'
whisper_exp_compute_token_level_timestamps: [?] 0.998 0.018 0.000 1.010 -7185 -7165 ' I'
whisper_exp_compute_token_level_timestamps: [?] 0.712 0.006 0.000 2.000 -7216 -7185 ','
whisper_exp_compute_token_level_timestamps: [?] 0.999 0.000 0.000 4.010 -7267 -7216 ' know'
whisper_exp_compute_token_level_timestamps: [?] 0.967 0.007 0.000 3.010 -7353 -7297 ' You'
[00:43:28.820 --> 00:43:30.820] Good morning.
whisper_exp_compute_token_level_timestamps: [_TT_1400] 0.808 0.810 0.997 18.000 261082 261082 '[_TT_1400]'
whisper_exp_compute_token_level_timestamps: [_TT_1400] 0.977 0.815 0.015 3.000 -16118 -7360 '.'
whisper_exp_compute_token_level_timestamps: [?] 0.911 0.017 0.000 7.010 -9197 -9197 ' morning'
whisper_exp_compute_token_level_timestamps: [?] 0.910 0.007 0.000 4.010 -7553 -9197 ' Good'
I think this should be fixed now thanks to https://github.com/ggerganov/whisper.cpp/commit/08dc705a694248fb94b6f64cbeb93f4e474635e3
Cool, it's working now :)