whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Using `--max-len` gives weird time codes

Open niksedk opened this issue 2 years ago • 2 comments
trafficstars

When running whisper.cpp with e.g. --max-len 77 I get some weird time codes. It does not happen when not using --max-len.

Examples:

[00:34:35.820 --> 00:34:36.820]   You built that with your brothers.

[00:-3:-29.-700 --> 00:41:15.820]   with and you were gone more than you were home.

[00:41:11.820 --> 00:-3:-29.-700]   They resent the hell out of you because you were the parent they were left

[00:41:08.820 --> 00:41:11.820]   Most of your children don't speak to their mother.

[01:03:24.820 --> 01:03:26.820]   Well, have the Grayson brothers signed it yet?

[00:18:36.250 --> 01:03:24.820]   was a solid deal, and you weren't here, so paperwork's already been filed.

[01:03:15.820 --> 00:18:36.250]   Listen, Gabrielle convinced the other partners that the Playground merger

niksedk avatar Nov 27 '22 07:11 niksedk

The --max-len option uses the experimental approach for word-level timestamps that I implemented and it probably has a bug. We might be able to debug it if you uncomment the following lines and send me the output around the timestamps that get messed up:

https://github.com/ggerganov/whisper.cpp/blob/164df0d447bd16a4f6cd0e08caef5b01a63354af/whisper.cpp#L3343-L3353

It should look something like this:

...
whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.000  0.000  3.010 10044 10057 ' can'
whisper_exp_compute_token_level_timestamps:        [?]  0.996  0.012  0.000  4.010 10060 10082 ' find'
whisper_exp_compute_token_level_timestamps:        [?]  1.000  0.998  0.000  4.010 10082 10102 ' hope'
whisper_exp_compute_token_level_timestamps:        [?]  0.994  0.082  0.000  3.010 10110 10124 ' and'
whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.095  0.000 11.010 10124 10193 ' inspiration'
whisper_exp_compute_token_level_timestamps:  [_TT_722]  0.940  0.603  0.057  2.010 10196 10206 ' in'
whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.409  0.001  3.010 10206 10221 ' our'
whisper_exp_compute_token_level_timestamps:        [?]  0.995  0.249  0.002 10.010 10221 10269 ' commitment'
whisper_exp_compute_token_level_timestamps:  [_TT_760]  0.829  0.533  0.162  2.010 10276 10294 ' to'
whisper_exp_compute_token_level_timestamps:        [?]  0.984  0.042  0.000  7.010 10294 10338 ' liberty'
whisper_exp_compute_token_level_timestamps:        [?]  0.998  0.357  0.002  3.000 10373 10407 '.'
whisper_exp_compute_token_level_timestamps:  [_TT_828]  0.364  0.364  1.000 15.000 10408 10408 '[_TT_828]'
[00:01:40.400 --> 00:01:44.080]   can find hope and inspiration in our commitment to liberty.
whisper_exp_compute_token_level_timestamps:        [?]  0.998  0.500  0.000  3.010 10414 10421 ' For'
whisper_exp_compute_token_level_timestamps:        [?]  1.000  0.000  0.000  4.010 10421 10438 ' more'
whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.000  0.000  4.010 10438 10453 ' than'
whisper_exp_compute_token_level_timestamps:        [?]  0.992  0.143  0.000  3.010 10454 10468 ' two'
whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.000  0.000  9.010 10468 10508 ' centuries'
whisper_exp_compute_token_level_timestamps:        [?]  0.984  0.267  0.001  2.000 10508 10519 ','
whisper_exp_compute_token_level_timestamps:  [_TT_884]  0.581  0.577  0.401  9.010 10519 10571 ' Americans'
whisper_exp_compute_token_level_timestamps:  [_TT_910]  0.372  0.652  0.571 15.000 10572 10572 '[_TT_910]'
[00:01:44.080 --> 00:01:45.720]   For more than two centuries, Americans
...

ggerganov avatar Nov 27 '22 09:11 ggerganov

Log (using the medium model):

[00:43:44.820 --> 00:43:46.820]   I was right behind them.

whisper_exp_compute_token_level_timestamps:  [_TT_800]  0.888  0.934  0.951 15.000 262682 262682 '[_TT_800]'

whisper_exp_compute_token_level_timestamps:        [?]  0.997  0.671  0.001  3.000 -5774 -5743 '.'

whisper_exp_compute_token_level_timestamps:        [?]  0.937  0.007  0.000  4.010 -5807 -5774 ' them'

whisper_exp_compute_token_level_timestamps:        [?]  1.000  0.000  0.000  6.010 -5872 -5818 ' behind'

whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.008  0.000  5.010 -5911 -5872 ' right'

whisper_exp_compute_token_level_timestamps:        [?]  1.000  0.500  0.000  3.010 -5940 -5911 ' was'

whisper_exp_compute_token_level_timestamps:        [?]  0.756  0.004  0.000  1.010 -5952 -5940 ' I'

[00:-1:-1.-890 --> 00:43:44.820]   they could.

[00:43:30.820 --> 00:-1:-1.-890]   You know, I didn't let those girls walk home alone. I only let them think

whisper_exp_compute_token_level_timestamps:  [_TT_700]  0.447  0.754  0.593 15.000 262482 262482 '[_TT_700]'

whisper_exp_compute_token_level_timestamps:        [?]  0.992  0.360  0.001  3.000 -6004 -5957 '.'

whisper_exp_compute_token_level_timestamps:        [?]  1.000  0.998  0.000  5.010 -6106 -6004 ' could'

whisper_exp_compute_token_level_timestamps:        [?]  0.991  0.038  0.000  4.010 -6189 -6117 ' they'

whisper_exp_compute_token_level_timestamps:        [?]  0.997  0.011  0.000  5.010 -6275 -6189 ' think'

whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.033  0.000  4.010 -6323 -6275 ' them'

whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.111  0.000  3.010 -6416 -6358 ' let'

whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.050  0.000  4.010 -6488 -6416 ' only'

whisper_exp_compute_token_level_timestamps:  [_TT_600]  0.799  0.900  0.199  1.010 -6510 -6488 ' I'

whisper_exp_compute_token_level_timestamps:        [?]  0.979  0.279  0.000  3.000 -6565 -6510 '.'

whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.022  0.000  5.010 -6659 -6565 ' alone'

whisper_exp_compute_token_level_timestamps:        [?]  0.995  0.016  0.000  4.010 -6735 -6659 ' home'

whisper_exp_compute_token_level_timestamps:        [?]  0.996  0.005  0.000  4.010 -6809 -6735 ' walk'

whisper_exp_compute_token_level_timestamps:        [?]  0.998  0.010  0.000  5.010 -6897 -6809 ' girls'

whisper_exp_compute_token_level_timestamps:        [?]  0.994  0.018  0.000  5.010 -6977 -6897 ' those'

whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.013  0.000  3.010 -7052 -6977 ' let'

whisper_exp_compute_token_level_timestamps:        [?]  1.000  0.998  0.000  2.000 -7091 -7052 ''t'

whisper_exp_compute_token_level_timestamps:        [?]  0.998  0.008  0.000  4.010 -7165 -7091 ' didn'

whisper_exp_compute_token_level_timestamps:        [?]  0.998  0.018  0.000  1.010 -7185 -7165 ' I'

whisper_exp_compute_token_level_timestamps:        [?]  0.712  0.006  0.000  2.000 -7216 -7185 ','

whisper_exp_compute_token_level_timestamps:        [?]  0.999  0.000  0.000  4.010 -7267 -7216 ' know'

whisper_exp_compute_token_level_timestamps:        [?]  0.967  0.007  0.000  3.010 -7353 -7297 ' You'

[00:43:28.820 --> 00:43:30.820]   Good morning.

whisper_exp_compute_token_level_timestamps: [_TT_1400]  0.808  0.810  0.997 18.000 261082 261082 '[_TT_1400]'

whisper_exp_compute_token_level_timestamps: [_TT_1400]  0.977  0.815  0.015  3.000 -16118 -7360 '.'

whisper_exp_compute_token_level_timestamps:        [?]  0.911  0.017  0.000  7.010 -9197 -9197 ' morning'

whisper_exp_compute_token_level_timestamps:        [?]  0.910  0.007  0.000  4.010 -7553 -9197 ' Good'


niksedk avatar Nov 27 '22 14:11 niksedk

I think this should be fixed now thanks to https://github.com/ggerganov/whisper.cpp/commit/08dc705a694248fb94b6f64cbeb93f4e474635e3

ggerganov avatar Jan 08 '23 13:01 ggerganov

Cool, it's working now :)

niksedk avatar Jan 08 '23 20:01 niksedk