Incorrect timestamps
Fixes #2271
- Adds consecutive timestamps after end of last segment as the new starting ts
- Add these timestamp to output when "print-special" enabled
- Fixes fflush usage in live reporting
I was not able to test this with the special "token_timestamps" option.
NB: This is my first Github PR so go easy on me.
@bviksoe
I tested this and it works. good catch! how did you found that problem? and how you got it fixed?
current whisper.cpp logs
cd /tmp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
wget "https://github.com/ggerganov/whisper.cpp/assets/61390950/bbf9d9c4-3d60-4693-832d-e48135edf379" -O audio.wav
cmake -B build .
cmake --build build
ffmpeg -i audio.wav -ar 16000 -ac 1 -c:a pcm_s16le normal.wav
./build/bin/main -f ./normal.wav -m "/Users/user/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin"
# Result
# [00:00:00.000 --> 00:00:06.000] I-I-I just wanna tell you how I'm feelin'
# [00:00:06.000 --> 00:00:08.700] Gotta make you understand that
# [00:00:08.700 --> 00:00:18.080] Never gonna give you up, never gonna let you down
# [00:00:18.080 --> 00:00:25.280] Never gonna run around and
PR log
# Test new PR
cd /tmp
git clone https://github.com/bviksoe/whisper.cpp -b master whisper1.cpp
cd whisper1.cpp
cmake -B build .
cmake --build build
./build/bin/main -f ../whisper.cpp/normal.wav -m "/Users/user/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin"
# Result
# [00:00:00.000 --> 00:00:06.000] I-I-I just wanna tell you how I'm feelin'
# [00:00:06.000 --> 00:00:08.700] Gotta make you understand that
# [00:00:14.080 --> 00:00:18.080] Never gonna give you up, never gonna let you down
# [00:00:22.600 --> 00:00:25.280] Never gonna run around and
Notice that the third timestamp is correct in the PR log.
@thewh1teagle
how did you found that problem?
If you uncomment the line https://github.com/ggerganov/whisper.cpp/blob/c118733a29ad4a984015a5c08fd585086d01087a/src/whisper.cpp#L142 you compile with extended debug trace. Then you should be able to see that the model actually produces extra timestamp tokens that this library was ignoring.
I was actually looking into why main is producing so many repetitions and hallucinations compared to similar libraries based on the same model.
I am unable to build it on Mac M1. It gives many errors, including stuff like mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
@manumaan Most likely unrelated to this PR as it doesn't affect any build procedures. The build error previously reported by CI was unrelated to this PR.
For general build problems, open a new Issue or ask in Discussions. Best advice is to use CMake (rather than Make) for building as outlined in the project overview. CMake worked for me on Windows.
This really helped with timing of my sentences. A segment would start long before it was actually spoken specially when music is played in between segments. However my word-level timestamps still suffer from going out of sync.
@bviksoe I found that it's accurate with medium model but tiny / small the timestamps are incorrect with silences
I'm thinking of trying to write a script to change any sub-timing end that is ahead of any sub-timing on the next line to become it. These are three examples with whisper.cpp 1.6.2 and having subs stay on screen for many minutes or even seconds when new subs come on it's pretty distracting. Hope to try out this PR someday or hope it gets merged.
06:22:32.254 --> 06:22:39.764 blah blah 1
06:22:39.764 --> 06:22:42.254 (<--for instance this would become 06:22:41.260) blah blah 2
06:22:41.260 --> 06:22:46.660 blah blah 3
06:22:46.660 --> 06:22:53.380 blah blah 4
06:22:53.380 --> 06:22:58.940 blah blah 5
06:22:58.940 --> 06:23:04.580 blah blah 6
40:03:11.653 --> 40:03:16.213 more and more 1
40:03:16.213 --> 40:12:32.523 (<-- would become 40:03:24.933) more and more 2
40:03:24.933 --> 40:03:29.573 more and more 3
40:03:30.373 --> 40:03:34.133 more and more 4
40:03:34.133 --> 40:03:34.773 more and more 5
57:02:18.560 --> 57:02:22.560 still some 1
57:02:22.560 --> 57:09:52.750 (<-- would become 57:02:30.560) still some 2
57:02:30.560 --> 57:02:34.560 still some 3
57:02:34.560 --> 57:02:38.560 still some 4
57:02:38.560 --> 57:02:40.560 still some 5
script fixes overlapping subs for vtt and srt
#!/bin/bash
cd ~/datampv/editing
[ ! -d subsoverlapfixed ] && mkdir subsoverlapfixed
for f in *.srt
do cp "$f" subsoverlapfixed
done
for f in *.vtt
do cp "$f" subsoverlapfixed
done
cd subsoverlapfixed
for f in *.srt
do cp "$f" "${f%.*}"_overlapping.srt
done
for f in *.vtt
do ffmpeg -y -i "$f" "${f%.*}"_overlapping.srt
done
for f in *_overlapping.srt
do awk '
BEGIN {
RS = "";
OFS = FS = "\n";
getline;
n = split($0, prev_rec);
split($2, prev_time, / --> /);
}
{
split($2, a, / --> /);
if (a[1] < prev_time[2])
prev_rec[2] = prev_time[1]" --> "a[1];
for (i=1;i<=n;i++)
print prev_rec[i];
printf("\n");
n = split($0, prev_rec);
split($2, prev_time, / --> /)
}
END {
print
}' "$f" > "${f%_overlapping.*}"_overlapfixed.srt
done
for f in *_overlapfixed.srt
do ffmpeg -y -i "$f" "${f%.*}".vtt
done
for f in *_overlapping.srt
do git diff --no-index --word-diff=color --word-diff-regex=. "$f" "${f%_overlapping.*}"_overlapfixed.srt | aha --black > "${f%_overlapping.*}"_diff.html
done
rm -f *_overlapping.srt
dt=$( date +%Y_%m_%d_%H_%M_%S)
mkdir $dt
mv * $dt
@bviksoe I found that it's accurate with medium model but tiny / small the timestamps are incorrect with silences
me too.