whisper.cpp
whisper.cpp copied to clipboard
Hallucinating long loop
Hi,
Trying whisper-cpp on a 3 hours long video. I encounter several loop, some having a duration of 30 minutes!
This change in Whisper seems to help: https://github.com/openai/whisper/pull/1903
Best regards
I have the same problem, can't transcript an audio since last update (11.4.3) without encounter an hallucinating loop; before last update everything was working like a charm.
I'm on a MacBook Pro M4 Pro 24GB with Sequoia 15.2, temperatures are always under control and nothing become hot here :)
any steps to repro (input file + CLI call) ?
MacWhisper > Open Files... > I open any kind of file (video/audio, m4a, mp4, m4v, etc.) of any duration > MacWhisper starts transcribing (see screenshot for settings) > suddenly and randomly MacWhisper goes hallucinating. That's it.
In this case I've opened an .m4a audio file on MacWhisper Pro 11.6 ⬇︎
I can provide another example if you guys are still interested in the problem.
I noticed this kind of hallucinating quite a while ago when seeking an implementation that can be HW-accelerated on Mac, especially on longer audio input (longer than 20min typically). And I just witnessed a serious one experimenting, and it seems to be consistently reproducible, described in the following steps.
Audio source
This Gamers Nexus Video on EKWB How 4 People Destroyed a $250 Million Tech Company
Pre-process:
ffmpeg -i <video> -vn -c:a pcm_s16le -ac 1 -ar 16000 exp.wav
Transcribe:
./whisper-cli -m ggml-large-v3.bin -l en -otxt -pc -f ../exp.wav
./whisper-cli -m ggml-large-v3.bin -l en -otxt -pc --vad -vm ggml-silero-v5.1.2.bin -f ../exp.wav
Result:
Shown in the pic above. Adding VAD will result in different but still a hallucinating loop.
This behavior - to me as a video translator - renders whisper.cpp unusable since it dropped 10+ minutes of transcription just at the end, and there are other less significant loops and repetitions after 15-20 min mark.
Side note:
This problem is less severe in older versions when cli call ./main is not deprecated. But it is there.
Try to play with beamsize
Thanks for the tip.
After going -bs 8(the max avaliable) with --vad, it is seemingly gone.
I'll continue to report if it re-appear in the future.
Sorry but after tinkering a little, I still believe this is a bug.
Any beam size other than the maximum 8 will result in hallucinating loops, with values other than 5 or 8 occurring even sooner.
The worst part is it just got stuck, with no signs of recovering, going on up to 40 min or even more.
nothing new: many other issues about hallucination: https://github.com/ggml-org/whisper.cpp/issues/1507 https://github.com/ggml-org/whisper.cpp/issues/1515 specially with v3, not v2. @Backyham such screenshots are not that useful, better to rerun with debug/trace loglevel.