whisper.cpp
whisper.cpp copied to clipboard
App is getting into endless loop
I am running whisper.cpp inside docker, as a POC I translated 800 WAV files.
In few cases (less then 5), the client is getting into endless loop at a certain time of the audio. If I tell it to start second after the loop point, it transcript the audio as expected. Few notes -
- The audio is english
- It happens across all models
- Using the python version - translate is fine.
- It is consistent always in the same point of time That is the output that I get (I change some of the text for privacy manaers): ..... [00:00:30.000 --> 00:00:32.000] Some valid text bla bla [00:00:32.000 --> 00:00:35.000] Some valid text bla bla [00:00:35.000 --> 00:00:53.000] Some valid text bla bla [00:00:53.000 --> 00:01:09.000] Some valid text bla bla [00:01:09.000 --> 00:01:30.000] We were working at high school. [00:01:30.000 --> 00:01:45.000] We were working at high school. [00:01:45.000 --> 00:02:06.000] We were working at high school. [00:02:06.000 --> 00:02:21.000] We were working at high school. [00:02:21.000 --> 00:02:40.000] We were working at high school. ... Goes like that up to the end.
Yeah, I've had the same happening to me with Chinese audio files... It only happens for some very few positions in the audio, but when it does happen it goes on until the end of the audio, and it also seems to happen consistently when I restart the transcription process...
I also have the same problem when I have to transcribe in Portuguese. Is there any solution for that?
Have you checked this: https://github.com/ggerganov/whisper.cpp/discussions/408#discussioncomment-4780401
With the parameter "--max-context 0", it no longer loops infinitely until the end of the file, but it still has some loops, but they are relatively small. Thank you!
This behavior occurs when the entropy-based repetition detection fails. It can be sometimes mitigated by adjusting the entropy threshold as explained here:
https://github.com/ggerganov/whisper.cpp/issues/471#issuecomment-1416947068
More robust strategy needs to be implemented.
Alternatively, you can try to use the beam-search decoder, but it will make the processing slower.