whisper.cpp
whisper.cpp copied to clipboard
whisper entering a transscription loop / possible regression
I am using whisper.cpp (most recent commit a56f435) and the large-v3 model and I am getting the impression that the phenomenon of whisper getting stuck in a loop of repeating the same sentence over and over again is getting much more frequent than it used to be.
I am aware that this is not a new issue as such and I only have anecdotal evidence to suggest that it has been getting more frequent. Are there any parameters I can use to trade accuracy and/or to avoid this behavior even at the cost of some speed?
Do not use large-v3. I haven't checked what is the status recently, but last time I did this model did not work well even with the original OpenAI repo. Try using v2 instead
I have reports about this happening for some of my users on large-v2 too.
whisper.cpp 1.5.4
Do not use large-v3. I haven't checked what is the status recently, but last time I did this model did not work well even with the original OpenAI repo. Try using v2 instead
Thanks for your fast reply. I used large-v2 on the same file and the loop does not occur. Recognition quality is slightly worse, but nothing that I didn't expect. If this is a problem with the model itself, I should be reporting that to openAI directly.
Same problem with large-v3. No problems with large-v2.
I am adding a comment here because some day someone might stumble across this issue via google and this could be of service to someone:
I noticed that the amount of hallucinations drastically drops when changing the beam size from the default value of 5 to the value of 8 using the flag -bs 8.
I played a bit with various settings including combinations of -bo (best of) and -bs and I found that the largest effect can be done by increasing -bs to 8.
I would be interested to hear if someone else experiencing these issues (especially with large-v3) can confirm my observation.
I am adding a comment here because some day someone might stumble across this issue via google and this could be of service to someone:
I noticed that the amount of hallucinations drastically drops when changing the beam size from the default value of 5 to the value of 8 using the flag -bs 8.
I played a bit with various settings including combinations of -bo (best of) and -bs and I found that the largest effect can be done by increasing -bs to 8.
I would be interested to hear if someone else experiencing these issues (especially with large-v3) can confirm my observation.
Can confirm, there is less hallucinations when setting -bs to 8. But for me whisper.cpp fails with error
whisper_full_with_state: failed to decode
./main: failed to process audio