whisper.cpp Very Slow Processing Time

I put in a 20 second audio clip, most of which is silent, and it took nearly 45 seconds to process the whole thing. Am I just missing some trick to get it to run faster? These are the params I'm sending to it for reference:

whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    {
        wparams.print_progress   = false;
        wparams.print_special    = false;
        wparams.print_realtime   = false;
        wparams.print_timestamps = false;
        wparams.translate        = false;
        wparams.single_segment   = false;
        wparams.max_tokens       = 32;
        wparams.language         = "en";
        wparams.n_threads        = std::max(1, std::min(8, (int32_t) std::thread::hardware_concurrency()));
        wparams.audio_ctx        = 768;
        wparams.speed_up         = false;
        wparams.temperature_inc  = wparams.temperature_inc;
        wparams.prompt_tokens    = nullptr;
        wparams.prompt_n_tokens  = 0;
    }

May 04 '23 21:05 Pigarian

What hardware are you running it on? What model and arguments are you passing? The large model on older hardware can definitely take some time.

May 05 '23 02:05 wesgould

probably 10 to 15 seconds or less to process and 30 to 35 seconds to load the model. It's normal,

May 05 '23 05:05 mrfragger

I'm using Whisper in a custom program, so the I've had to hard code the parameters as shown above. I'm using the base language model. I'm running on an Intel i5-10300H @ 2.5 GHz so I wouldn't be surprised to learn that that's just the culprit. I run whisper_context * ctx = whisper_init_from_file("ggml-base.en.bin"); long before using whisper_full

May 05 '23 19:05 Pigarian

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

Jun 06 '23 11:06 JJJJaymax

I had slow transcriptions depending which settings I was using. These settings improved it for me:

        settings["LLVM_LTO"] = "YES"
        settings["OTHER_CFLAGS"] = ["-O3", "-DNDEBUG"]

Jun 16 '23 09:06 23inhouse

@23inhouse Where did you add these settings?

Jul 17 '23 17:07 shaynemei

@shaynemei I use tuist to manage xcode settings but if you search for LLVM_LTO and DNDEBUG in this repo or in google you can see where they are set in the xcodeproj files. There is a way to set those in the xcode gui. You could also set them directly in the files. I'm sorry I can't remember the exact steps.

Jul 21 '23 12:07 23inhouse

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

@JJJJaymax Did you find a solution to your issue, I also want to use whisper.cpp in my qt 6.67 application.

Jun 04 '24 10:06 kittechAI

Did you build with CUDA enabled?

Jun 04 '24 21:06 ulatekh

Did you build with CUDA enabled?

No. I need to run it on a CPU instead. the computer that I want it to run on does not have GPU support. Can you please help

Jun 10 '24 19:06 kittechAI

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

same problem.

Mar 03 '25 11:03 KernelInterrupt

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

I solve it by setting CMAKE_BUILD_TYPE to "release".

Mar 03 '25 16:03 KernelInterrupt

I eventually encountered the same issue. Newer versions are extremely cpu intensive and useless in my case. I used git checkout v1.5.1 this is a fast version and works for me on my Ryzen 5 7600.

Jun 14 '25 08:06 simonsays-techtalk