whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Fine-tuned Whisper models are very slow

Open 10-zin opened this issue 2 years ago • 2 comments

I tried to run medium whisper fine-tuned models.

  1. The standard open-ai models are fast. (infer in half the audio file time, 16threads, 1porcess)
  2. Even the standard Hugging Face whisper models are fast. (infer in half the audio file time, 16threads, 1porcess)

But, The fine-tuned whisper models are not fast at all

  • they take more than double the audio file to transcribe, and get worse as the file time increases.)
  • Even when scaling the number of threads and processes there is no benefit.

Would really like to know if anyone is facing the same issue, and how to solve this. @ggerganov .. My insights on this are..

  1. Usually I have observed after fine-tuning the hugging face models are also not predicting the time-stamps. It just hard segments at 30 seconds. Maybe that has something to do with it.
  2. Or, the way.. cpp counter-parts of fine-tuned models are created may have an effect. Although that didn't affect the non fine-tuned hugging face whisper models.

10-zin avatar Mar 16 '23 12:03 10-zin

The fallback implementation currently is suboptimal and I think this is causing the slow performance. Try using --no-fallback for now, and in the future we will try to improve the performance for fallbacks

ggerganov avatar Mar 22 '23 19:03 ggerganov

@ggerganov aweseomee!!! this worked. For me, it transcribes a 30 seconds segment in 17-18 seconds, which is relatively fast! With fallback, it varies a lot and many times takes more than 2 minutes.

All your works are mindblowing! Thank you for existing! Huge inspiration, keep pushing the boundaries!

10-zin avatar Mar 23 '23 01:03 10-zin

Perhaps this is a stupid question, but how do I implement --no-fallback with a finetuned model using the huggingface pipeline? I'm having a finetuned version of whisper-medium.en take 8 seconds for a 3 second clip.

DaRealDJ avatar Dec 13 '23 21:12 DaRealDJ

Perhaps this is a stupid question, but how do I implement --no-fallback with a finetuned model using the huggingface pipeline? I'm having a finetuned version of whisper-medium.en take 8 seconds for a 3 second clip.

you can just provide it as an argument to whisper.cpp

./main -m ../ggml-small-model.bin -l si -bs 0 -d 7000 --no-fallback -debug -bo 1 -pp -ps -t 16 -f samples/test.wav

Dharisd avatar Dec 17 '23 07:12 Dharisd