whisper.cpp Incorrect timestamps

Fixes #2271

Adds consecutive timestamps after end of last segment as the new starting ts
Add these timestamp to output when "print-special" enabled
Fixes fflush usage in live reporting

I was not able to test this with the special "token_timestamps" option.

NB: This is my first Github PR so go easy on me.

Jul 03 '24 10:07 bviksoe

@bviksoe

I tested this and it works. good catch! how did you found that problem? and how you got it fixed?

current whisper.cpp logs

cd /tmp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
wget "https://github.com/ggerganov/whisper.cpp/assets/61390950/bbf9d9c4-3d60-4693-832d-e48135edf379" -O audio.wav
cmake -B build .
cmake --build build
ffmpeg -i audio.wav -ar 16000 -ac 1 -c:a pcm_s16le normal.wav
./build/bin/main -f ./normal.wav -m "/Users/user/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin"

# Result

# [00:00:00.000 --> 00:00:06.000]   I-I-I just wanna tell you how I'm feelin'
# [00:00:06.000 --> 00:00:08.700]   Gotta make you understand that
# [00:00:08.700 --> 00:00:18.080]   Never gonna give you up, never gonna let you down
# [00:00:18.080 --> 00:00:25.280]   Never gonna run around and

PR log

# Test new PR

cd /tmp
git clone https://github.com/bviksoe/whisper.cpp -b master whisper1.cpp
cd whisper1.cpp
cmake -B build .
cmake --build build
./build/bin/main -f ../whisper.cpp/normal.wav -m "/Users/user/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin"

# Result

# [00:00:00.000 --> 00:00:06.000]   I-I-I just wanna tell you how I'm feelin'
# [00:00:06.000 --> 00:00:08.700]   Gotta make you understand that
# [00:00:14.080 --> 00:00:18.080]   Never gonna give you up, never gonna let you down
# [00:00:22.600 --> 00:00:25.280]   Never gonna run around and

Notice that the third timestamp is correct in the PR log.

Jul 05 '24 17:07 thewh1teagle

@thewh1teagle

how did you found that problem?

If you uncomment the line https://github.com/ggerganov/whisper.cpp/blob/c118733a29ad4a984015a5c08fd585086d01087a/src/whisper.cpp#L142 you compile with extended debug trace. Then you should be able to see that the model actually produces extra timestamp tokens that this library was ignoring.

I was actually looking into why main is producing so many repetitions and hallucinations compared to similar libraries based on the same model.

Jul 05 '24 18:07 bviksoe

I am unable to build it on Mac M1. It gives many errors, including stuff like mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"

Jul 26 '24 16:07 manumaan

@manumaan Most likely unrelated to this PR as it doesn't affect any build procedures. The build error previously reported by CI was unrelated to this PR.

For general build problems, open a new Issue or ask in Discussions. Best advice is to use CMake (rather than Make) for building as outlined in the project overview. CMake worked for me on Windows.

Jul 26 '24 18:07 bviksoe

This really helped with timing of my sentences. A segment would start long before it was actually spoken specially when music is played in between segments. However my word-level timestamps still suffer from going out of sync.

Aug 14 '24 05:08 dkakaie

@bviksoe I found that it's accurate with medium model but tiny / small the timestamps are incorrect with silences

Aug 23 '24 12:08 thewh1teagle

I'm thinking of trying to write a script to change any sub-timing end that is ahead of any sub-timing on the next line to become it. These are three examples with whisper.cpp 1.6.2 and having subs stay on screen for many minutes or even seconds when new subs come on it's pretty distracting. Hope to try out this PR someday or hope it gets merged.

06:22:32.254 --> 06:22:39.764 blah blah 1

06:22:39.764 --> 06:22:42.254 (<--for instance this would become 06:22:41.260) blah blah 2

06:22:41.260 --> 06:22:46.660 blah blah 3

06:22:46.660 --> 06:22:53.380 blah blah 4

06:22:53.380 --> 06:22:58.940 blah blah 5

06:22:58.940 --> 06:23:04.580 blah blah 6

40:03:11.653 --> 40:03:16.213 more and more 1

40:03:16.213 --> 40:12:32.523 (<-- would become 40:03:24.933) more and more 2

40:03:24.933 --> 40:03:29.573 more and more 3

40:03:30.373 --> 40:03:34.133 more and more 4

40:03:34.133 --> 40:03:34.773 more and more 5

57:02:18.560 --> 57:02:22.560 still some 1

57:02:22.560 --> 57:09:52.750 (<-- would become 57:02:30.560) still some 2

57:02:30.560 --> 57:02:34.560 still some 3

57:02:34.560 --> 57:02:38.560 still some 4

57:02:38.560 --> 57:02:40.560 still some 5

Oct 13 '24 17:10 mrfragger

script fixes overlapping subs for vtt and srt

#!/bin/bash
cd ~/datampv/editing
[ ! -d subsoverlapfixed ] && mkdir subsoverlapfixed
for f in *.srt
do cp "$f" subsoverlapfixed
done
for f in *.vtt
do cp "$f" subsoverlapfixed
done
cd subsoverlapfixed
for f in *.srt
do cp "$f" "${f%.*}"_overlapping.srt
done
for f in *.vtt
do ffmpeg -y -i "$f" "${f%.*}"_overlapping.srt
done
for f in *_overlapping.srt
do awk '
  BEGIN {
    RS = "";
    OFS = FS = "\n";
    getline;
    n = split($0, prev_rec);
    split($2, prev_time, / --> /);
  }
  {
    split($2, a, / --> /);
    if (a[1] < prev_time[2])
      prev_rec[2] = prev_time[1]" --> "a[1];
    for (i=1;i<=n;i++)
      print prev_rec[i];
    printf("\n");
    n = split($0, prev_rec);
    split($2, prev_time, / --> /)
  }
  END {
    print
  }' "$f" > "${f%_overlapping.*}"_overlapfixed.srt
done
for f in *_overlapfixed.srt
do ffmpeg -y -i "$f" "${f%.*}".vtt
done 
for f in *_overlapping.srt
do git diff --no-index --word-diff=color --word-diff-regex=. "$f" "${f%_overlapping.*}"_overlapfixed.srt | aha --black  > "${f%_overlapping.*}"_diff.html
done
rm -f *_overlapping.srt
dt=$( date +%Y_%m_%d_%H_%M_%S)
mkdir $dt
mv * $dt

Nov 04 '24 19:11 mrfragger

@bviksoe I found that it's accurate with medium model but tiny / small the timestamps are incorrect with silences

me too.

Feb 26 '25 08:02 Carmel0