whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Inferencing result different from original whisper with GPU even when using same model

Open ssteo opened this issue 3 years ago • 4 comments

Is there any parameter that needs to be added into the implementation like in https://github.com/openai/whisper/tree/main/whisper/assets/multilingual ?

I've tested all models and found the inferenced results are different compared to using original whisper with GPU. I'm wondering what is missing in my setup or there is some difference in the implementation of this project?

ssteo avatar Dec 10 '22 19:12 ssteo

I don't know this in detail but it's a different implementation, found this bit from the original announcement:

Just a note that the whisper.cpp implementation currently only supports the greedy sampling strategy, so to make a fair comparison with PyTorch, you would need to disable the beam search when running it.

(That's from October though, so I'm not sure if it still applies...things move fast) The original whisper itself gives you different results depending on options (beam size etc.) and apparently there is a possibility of nondeterminism in the play also.

misutoneko avatar Dec 11 '22 00:12 misutoneko

I also found differences on WER calculation results between large models of PyTorch and whisper.cpp. Whisper.cpp got worse WER score for my tests on large model (ie.. 12% vs 18% WER). Is there any way to bring whisper.cpp with same level accuracy by settings? Naive question but I am learning recently.

RYucel avatar Dec 11 '22 17:12 RYucel

The decoding strategy in whisper.cpp is not exactly the same as the one in the original OpenAI repo. Differences can be expected and likely whisper.cpp is inferior atm. In any case, if you want to make fair comparisons between the two, make sure to run the PyTorch version using the Greedy decoder as explained in the README.

@RYucel Can you give a tutorial for computing WER? Are you running the PyTorch implementation with the Greedy decoder?

ggerganov avatar Dec 11 '22 18:12 ggerganov

I've encountered this as well with the whisper commandline vs. using whisper from a python script (both have different defaults), see here:

https://github.com/openai/whisper/discussions/591

The default parameters that the python whisper command line tool uses are:

result = model.transcribe("audio.mp3", language=language, task='transcribe', temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0), best_of=5, beam_size=5, suppress_tokens="-1", condition_on_previous_text=True, fp16=True, compression_ratio_threshold=2.4, logprob_threshold=-1., no_speech_threshold=0.6)

Biggest difference is that the python whisper decoder does beamsearch, conditions the segments on the preceding ones, temperature back-off when a compression ratio signals likely faulty output (see the example in the whisper discussion link). whisper.cpp already mentions in doesn't do beamsearch, my guess it doesn't do any of the other stuff either.

You can also try to check if the outputs are more similar if you set best_of=1, beam_size=1 or best_of=None, beam_size=None, basically making Python whisper do greedy decoding too.

bmilde avatar Jan 03 '23 16:01 bmilde

With the latest version the whisper.cpp results should be better and hopefully closer to the Python implementation.

By default, the main example corresponds to:

  • temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
  • best_of=5
  • beam_size=None
  • suppress_tokens="-1",
  • condition_on_previous_text=True
  • fp16=True
  • compression_ratio_threshold=2.4
  • logprob_threshold=-1.

You can enable beamsearch via--beam_size 5 - it is disabled by default.

ggerganov avatar Jan 15 '23 14:01 ggerganov

Hi @ggerganov You have done something phenomenal with this work! Sorry to comment on a closed issue but I was wondering if there is any switch to set --condition_on_previous_text to False?

crisdosaygo avatar Feb 10 '23 03:02 crisdosaygo

@crisdosyago Passing --max-context 0 to main should be equivalent to --condition_on_previous_text False

ggerganov avatar Feb 14 '23 17:02 ggerganov

Thank you, sir!

crisdosaygo avatar Feb 15 '23 01:02 crisdosaygo