whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Hallucinating long loop

Open Chevek opened this issue 9 months ago • 3 comments
trafficstars

Hi,

Trying whisper-cpp on a 3 hours long video. I encounter several loop, some having a duration of 30 minutes!

This change in Whisper seems to help: https://github.com/openai/whisper/pull/1903

Best regards

Chevek avatar Jan 21 '25 22:01 Chevek

I have the same problem, can't transcript an audio since last update (11.4.3) without encounter an hallucinating loop; before last update everything was working like a charm.

I'm on a MacBook Pro M4 Pro 24GB with Sequoia 15.2, temperatures are always under control and nothing become hot here :)

gabrielealtobelli avatar Jan 22 '25 14:01 gabrielealtobelli

any steps to repro (input file + CLI call) ?

WilliamTambellini avatar Feb 19 '25 18:02 WilliamTambellini

MacWhisper > Open Files... > I open any kind of file (video/audio, m4a, mp4, m4v, etc.) of any duration > MacWhisper starts transcribing (see screenshot for settings) > suddenly and randomly MacWhisper goes hallucinating. That's it.

In this case I've opened an .m4a audio file on MacWhisper Pro 11.6 ⬇︎

Image

gabrielealtobelli avatar Feb 24 '25 09:02 gabrielealtobelli

I can provide another example if you guys are still interested in the problem.

I noticed this kind of hallucinating quite a while ago when seeking an implementation that can be HW-accelerated on Mac, especially on longer audio input (longer than 20min typically). And I just witnessed a serious one experimenting, and it seems to be consistently reproducible, described in the following steps.

Image

Audio source

This Gamers Nexus Video on EKWB How 4 People Destroyed a $250 Million Tech Company

Pre-process:

ffmpeg -i <video> -vn -c:a pcm_s16le -ac 1 -ar 16000 exp.wav

Transcribe:

./whisper-cli -m ggml-large-v3.bin -l en -otxt -pc -f ../exp.wav ./whisper-cli -m ggml-large-v3.bin -l en -otxt -pc --vad -vm ggml-silero-v5.1.2.bin -f ../exp.wav

Result:

Shown in the pic above. Adding VAD will result in different but still a hallucinating loop.

Image

This behavior - to me as a video translator - renders whisper.cpp unusable since it dropped 10+ minutes of transcription just at the end, and there are other less significant loops and repetitions after 15-20 min mark.

Backyham avatar Jul 17 '25 16:07 Backyham

Side note: This problem is less severe in older versions when cli call ./main is not deprecated. But it is there.

Backyham avatar Jul 17 '25 16:07 Backyham

Try to play with beamsize

WilliamTambellini avatar Jul 17 '25 16:07 WilliamTambellini

Thanks for the tip.

After going -bs 8(the max avaliable) with --vad, it is seemingly gone.

I'll continue to report if it re-appear in the future.

Backyham avatar Jul 18 '25 03:07 Backyham

Sorry but after tinkering a little, I still believe this is a bug.

Any beam size other than the maximum 8 will result in hallucinating loops, with values other than 5 or 8 occurring even sooner.

The worst part is it just got stuck, with no signs of recovering, going on up to 40 min or even more.

Image Image

Backyham avatar Jul 18 '25 03:07 Backyham

nothing new: many other issues about hallucination: https://github.com/ggml-org/whisper.cpp/issues/1507 https://github.com/ggml-org/whisper.cpp/issues/1515 specially with v3, not v2. @Backyham such screenshots are not that useful, better to rerun with debug/trace loglevel.

WilliamTambellini avatar Aug 20 '25 17:08 WilliamTambellini