whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

whisper entering a transscription loop / possible regression

Open MathiasSchindler opened this issue 11 months ago • 6 comments

Bildschirmfoto vom 2024-03-12 09-58-58

I am using whisper.cpp (most recent commit a56f435) and the large-v3 model and I am getting the impression that the phenomenon of whisper getting stuck in a loop of repeating the same sentence over and over again is getting much more frequent than it used to be.

I am aware that this is not a new issue as such and I only have anecdotal evidence to suggest that it has been getting more frequent. Are there any parameters I can use to trade accuracy and/or to avoid this behavior even at the cost of some speed?

MathiasSchindler avatar Mar 12 '24 09:03 MathiasSchindler

Do not use large-v3. I haven't checked what is the status recently, but last time I did this model did not work well even with the original OpenAI repo. Try using v2 instead

ggerganov avatar Mar 12 '24 11:03 ggerganov

I have reports about this happening for some of my users on large-v2 too.

whisper.cpp 1.5.4

sindresorhus avatar Mar 12 '24 11:03 sindresorhus

Do not use large-v3. I haven't checked what is the status recently, but last time I did this model did not work well even with the original OpenAI repo. Try using v2 instead

Thanks for your fast reply. I used large-v2 on the same file and the loop does not occur. Recognition quality is slightly worse, but nothing that I didn't expect. If this is a problem with the model itself, I should be reporting that to openAI directly.

MathiasSchindler avatar Mar 12 '24 14:03 MathiasSchindler

Same problem with large-v3. No problems with large-v2.

k1ng avatar Mar 28 '24 17:03 k1ng

I am adding a comment here because some day someone might stumble across this issue via google and this could be of service to someone:

I noticed that the amount of hallucinations drastically drops when changing the beam size from the default value of 5 to the value of 8 using the flag -bs 8.

I played a bit with various settings including combinations of -bo (best of) and -bs and I found that the largest effect can be done by increasing -bs to 8.

I would be interested to hear if someone else experiencing these issues (especially with large-v3) can confirm my observation.

MathiasSchindler avatar Jul 12 '24 15:07 MathiasSchindler

I am adding a comment here because some day someone might stumble across this issue via google and this could be of service to someone:

I noticed that the amount of hallucinations drastically drops when changing the beam size from the default value of 5 to the value of 8 using the flag -bs 8.

I played a bit with various settings including combinations of -bo (best of) and -bs and I found that the largest effect can be done by increasing -bs to 8.

I would be interested to hear if someone else experiencing these issues (especially with large-v3) can confirm my observation.

Can confirm, there is less hallucinations when setting -bs to 8. But for me whisper.cpp fails with error

whisper_full_with_state: failed to decode
./main: failed to process audio

Kostyansa avatar Aug 21 '24 18:08 Kostyansa