Whisper-WebUI icon indicating copy to clipboard operation
Whisper-WebUI copied to clipboard

Subtitle generation is not working properly.

Open lgs777 opened this issue 1 year ago • 13 comments

Which OS are you using?

  • OS: [e.g. iOS or Windows.. If you are using Google Colab, just Colab.]

windows 11

After a long-awaited update, I attempted to generate Chinese subtitles. As time goes on, I'm encountering an issue where subtitles are generated as numbers only from a certain point.


1286 00:25:55,660 --> 00:25:56,660 90

1287 00:25:56,660 --> 00:25:57,660 90

1288 00:25:57,660 --> 00:25:58,660 90

1289 00:25:58,660 --> 00:25:59,660 90

1290 00:25:59,660 --> 00:26:00,660 90

lgs777 avatar May 17 '24 07:05 lgs777

Hi, it seems like whisper hallucination.

  • Related discussion
    • https://github.com/openai/whisper/discussions/679

Many of possible solutions are discussed here.

You can try

  • Set condition_on_previous_text to False
  • Tune no_speech_threshold and log_probability_threshold values.

You can adjust these parameters in the "Advanced Parameters" tab of the WebUI.

Setting condition_on_previous_text to False would make texts less consistent about the context, but it will help to whisper to escape the "loop of failures" that you experienced.

no_speech_threshold and log_probability_threshold are the parameters that define how whisper will be "sensetive" to the small sounds. For example, in your case, this might happen because whisper is too sensitive to small sounds.

Increasing both no_speech_threshold and log_probability_threshold would make whisper insensitive to the small sounds.

*Instead of tweaking these parameters, I'll just add a vad_filter parameter that enables the Silero VAD filter for easy use.

jhj0517 avatar May 17 '24 09:05 jhj0517

Silero VAD Filter is added in #153.

Open the "Advanced Parameters" tab in the WebUI, and check "Enable Silero VAD Filter". If the hallucination still occurs, uncheck "Condition On Previous Text".

If the hallucination still exists with the above methods, please let me know.

jhj0517 avatar May 17 '24 10:05 jhj0517

Increasing temperature also solves this

RYG81 avatar May 21 '24 11:05 RYG81

I have also recently encountered the same hallucination issue in Korean. Even when using the vad_filter and adjusting the Advanced Parameters comprehensively, the same hallucination occurs after a certain point.

In my case, I found that changing the Model to large-v2 prevents hallucinations, although the text generation quality decreases.

Previously, there were no issues even when using large-v3, so I believe there is definitely a problem with whisper.

windo-developer avatar May 22 '24 03:05 windo-developer

@jhj0517 Your efforts are always appreciated. Thank you for your feedback.

lgs777 avatar May 28 '24 16:05 lgs777

Silero VAD Filter is added in #153.

Open the "Advanced Parameters" tab in the WebUI, and check "Enable Silero VAD Filter". If the hallucination still occurs, uncheck "Condition On Previous Text".

If the hallucination still exists with the above methods, please let me know.

@jhj0517

The above method still causes problems. I don't have a problem with V2, but I have a problem with V3. I'm extracting Chinese subtitles.

lgs777 avatar May 29 '24 10:05 lgs777

@lgs777 Thanks for pointing this out, I think this is a pretty notable issue. I'll just update the default model to large-v2 for now.

jhj0517 avatar May 29 '24 13:05 jhj0517

Thank you for all your help. I was having problems with hallucination when exporting Japanese conversations, but changing to large-v2 greatly improved the problem. I still had a little hallucination, but raising Temperrature to 0.2 eliminated it.

cookiexND avatar Jun 06 '24 03:06 cookiexND

I just added BGM separation pre processing to reduce such hallucinations in #267.

image

It gave me really better result on my test when the audio includes bgm, please feel free to share your result.

jhj0517 avatar Sep 13 '24 13:09 jhj0517

I was getting unusable translation results before turning on BGM separation and Silero VAD. It should be clarified that they are meant for this. A hint in the UI next to the translation toggle would go a VERY long way towards users understanding what these actually do.

mark-wd avatar Oct 01 '24 21:10 mark-wd

@mark-wd Thanks for pointing that out. I updated some labels for clearer use of submodels in #308.

If anyone has suggestions for better clarification, I'd appreciate it.

jhj0517 avatar Oct 02 '24 13:10 jhj0517