whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Segments Repeating in a Loop when Using 'prompt_tokens'.

Open IgorKolo opened this issue 1 year ago • 2 comments

I know the repeating segment 'hallucination' was reported multiple times in the past - I've read through most of it and didn't find a solution.   The issue: whisper is outputting the same segment text over and over, independently of actual audio input. This began happening after I started using prompt_tokens from a previous segment.

So we'll start with this one this one's a little bit stronger than that one out of North Miami Beach in Miami shores heads up Miami Beach surfside the rain's start any second now you can probably already feel a few rain drops out there hall over sunny isle's beach could get some strong wind gusts up to 50 miles per hour as this is coming through lots of intense cloud the ground lightning strikes with that band up here in the east where we that flood advisory was talking about this goes until five o'clock for sunrise and plantait. a little bit of rain in the south. Now coming up I'm talking about this flood watch extended into the weekend let's go over to the south. So I'm just going to go back to the next one. I'm going to go back to the next one. So I'm going to go back to the next one. So I'm going to go back to the next one. So I'm going to go back to the next one. So I'm going to go back to the next one. So I'm going to go back to the next one. So I'm going to go back to the next one.

My use case: real-time segment by segment transcription. Input to whisper_full are always 3 sec audio buffers (anything shorter results in poor accuracy). Each buffer also begins with 0.2 sec from the end of the previous one. I have tried models from tiny to medium with similar results. Parameters are set like this:

whisperParams_ = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
 
const int max_threads = min(4, static_cast<int>(std::thread::hardware_concurrency()));
 
whisperParams_.print_realtime   = true;   
whisperParams_.print_progress   = false;
whisperParams_.print_timestamps = true;
whisperParams_.print_special    = false;
whisperParams_.translate        = false;
whisperParams_.language         = "en";
whisperParams_.n_threads        = max_threads;
whisperParams_.offset_ms        = 0;
whisperParams_.no_context       = false;
whisperParams_.single_segment   = true; 
  
// recommended setting to solve the repeated sentence 'hallucination'
// from https://github.com/ggerganov/whisper.cpp/issues/896
whisperParams_.temperature_inc = 0.1f;
whisperParams_.beam_search.beam_size = 5;
whisperParams_.entropy_thold = 2.8f;
whisperParams_.n_max_text_ctx = 64;
 
//  whisperParams_.n_max_text_ctx = 0;  // this will solve the repeating segments issue; but accuracy is not good.

Plus prompt_tokens and prompt_n_tokens extracted from a previous segment.   Sometimes it would run without any 'hallucinations' - and the output quality and speed are quite acceptable. When it goes into repletion - the output, of course, is useless.   Is there a solution for this, please?

IgorKolo avatar Jun 15 '23 01:06 IgorKolo