Whisper
Whisper copied to clipboard
endless loop repetition
hi i got repetition/hallucination lines when trying to transcribe repeated lines have the same words (it stop transcriping and just list the same words repeated) how can i solve this please?
-mc 0 helps but it's a larger issue..
where i can do this? and how ?
-mc 0 helps but it's a larger issue..
You can specify it using the commandline version. Also, the issue was reported very often, first occurence in this repository is from myself: https://github.com/Const-me/Whisper/issues/26
i use the desktop version do you mean that i should downloa the source and open it in visual studio ?
You can specify it using the commandline version. Also, the issue was reported very often, first occurence in this repository is from myself: #26
Sorry for the short explaination, no i did not want you to compile the source code or such. Just download and use the commandline version. Download is called clip.zip, it contains main.exe, then call on commandline like this (of course you need to modify all the paths in this example by the paths where your stuff is, or create the same paths that i specify here and place the same needed filenames there)
C:\whisper\main.exe -mc 0 -f C:\temp\test.wav -l de -m C:\whisper\models\ggml-large.bin
We can also simply use this in a batch file and drag/drop files to translate on the bat file. Create a new empty text file on your desktop (or anywhere else) with the following content:
"C:\whisper\main.exe" -m "C:\whisper\models\ggml-large.bin" -l de -mc 0 -osrt %1
pause
Change the path to main.exe and model to your desire (but leave the ""), also change -l de to your language code and -mc 0 to -mc 224 if you want to raise transcribe quality but (currently) add a risk of falling into repetition loop. You can change -osrt to ovtt or -otxt, this is the option that enables writing a text file. Save the file and drag/drop some audio file on the .bat file. A cmd window should open while transcribe is running and you can close it to cancel or when finished.
A further improvement could be to add a simple registry key that adds a rightclick menu on .mp3 .wav files to start the batch and transcribe. If anyone is interested, please ask.
Fashionable late to repeat the above :-) Yes, -mc 0 is a command line option, the gui has limited options, and to be honest, i spend hours searching for a detailed explanation on all the command line options, and didn't succeed. The -mc 0 was from another thread on the subject
thanks alot this helped and its now work with no loops but i get a message sometimes runFullImpl: failed to generate timestamp token - skipping one second is there any solution to this ?
@Const-me any chance we can get --max-context
exposed in the GUI without having to use the command line? At least until the repetitions issue gets fixed by OpenAI and it trickles down.
Sorry for the short explaination, no i did not want you to compile the source code or such. Just download and use the commandline version. Download is called clip.zip, it contains main.exe, then call on commandline like this (of course you need to modify all the paths in this example by the paths where your stuff is, or create the same paths that i specify here and place the same needed filenames there)
C:\whisper\main.exe -mc 0 -f C:\temp\test.wav -l de -m C:\whisper\models\ggml-large.bin
I downloaded this repository as a ZIP file, but I couldn't find the main.exe in it.Where can I find this one.
The compiled binaries and source code are 2 different things :) The cli program main.exe is in clip.zip, you can download it where you download all other compiled stuff: https://github.com/Const-me/Whisper/releases/
@emcodem hi do you know how to solve this? runFullImpl: failed to generate timestamp token - skipping one second
@eagleftw023 no, i mention the same in https://github.com/Const-me/Whisper/issues/26 As this message comes at the same spot where it started repeating before for me, i believed that this might be the actual root cause of all the troubles. However, the general line in this repository seems to be that we wait for the guys from whisper original project (or the guys from the whisper cpp project) to come up with a solution so we can copy what they do in this repository
whisper.cpp
pushed an update today in v1.30 that supposedly might help with the repetitions:
https://github.com/ggerganov/whisper.cpp/releases/tag/v1.3.0
The key commit is this one:
https://github.com/ggerganov/whisper.cpp/commit/f19e23fbd108ec3ac458c7a19b31c930719e7a94
Hopefully we'll see this quickly ported over to this project.
whisper.cpp
pushed an update today in v1.30 that supposedly might help with the repetitions:https://github.com/ggerganov/whisper.cpp/releases/tag/v1.3.0
The key commit is this one:
Hopefully we'll see this quickly ported over to this project.
@Const-me
Just tested it with latest ggerganov master, the problem is not solved. I don't understand why he thinks the problem can just be solved by reducing the parallel decoders but he for sure has his reasons
users of the cli version can try the same workaround that i use, there is a download here https://github.com/Const-me/Whisper/issues/26
Ok, newbie here: I used the command line and it seemed to improve it a bit, however: where is my txt file saved? :D
@tondeaf check out help by executing main.exe without any parameters. e.g. adding -osrt to the command line params will write a srt file next to the input file.
Thank you! (can you set a name for the output text file or will it just always be the input file +txt?)
Saw this posted today on whisper.cpp
if anybody wants to give it a try:
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
https://github.com/EtienneAb3d/WhisperHallu
Related OpenAI:
https://github.com/openai/whisper/discussions/679
Well preprocessing is a topic in general but i fear it is not a general purpose solution for the looping topic as it will usually make some situations better and other worse. As there are like indefinite ways of preprocessing and none of them will be perfect for all usecases, i feel like the core projects like whisper cpp and Const-me version should not attempt preprocessing at all. Instead they should provide interfaces that allow easy connection of preprocessors and the core software.
@tondeaf i don't think main.exe has yet a way to specify the srt output filename/location.