whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Very Slow Processing Time

Open Pigarian opened this issue 2 years ago • 10 comments

I put in a 20 second audio clip, most of which is silent, and it took nearly 45 seconds to process the whole thing. Am I just missing some trick to get it to run faster? These are the params I'm sending to it for reference:

whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    {
        wparams.print_progress   = false;
        wparams.print_special    = false;
        wparams.print_realtime   = false;
        wparams.print_timestamps = false;
        wparams.translate        = false;
        wparams.single_segment   = false;
        wparams.max_tokens       = 32;
        wparams.language         = "en";
        wparams.n_threads        = std::max(1, std::min(8, (int32_t) std::thread::hardware_concurrency()));
        wparams.audio_ctx        = 768;
        wparams.speed_up         = false;
        wparams.temperature_inc  = wparams.temperature_inc;
        wparams.prompt_tokens    = nullptr;
        wparams.prompt_n_tokens  = 0;
    }

Pigarian avatar May 04 '23 21:05 Pigarian

What hardware are you running it on? What model and arguments are you passing? The large model on older hardware can definitely take some time.

wesgould avatar May 05 '23 02:05 wesgould

probably 10 to 15 seconds or less to process and 30 to 35 seconds to load the model. It's normal,

mrfragger avatar May 05 '23 05:05 mrfragger

I'm using Whisper in a custom program, so the I've had to hard code the parameters as shown above. I'm using the base language model. I'm running on an Intel i5-10300H @ 2.5 GHz so I wouldn't be surprised to learn that that's just the culprit. I run whisper_context * ctx = whisper_init_from_file("ggml-base.en.bin"); long before using whisper_full

Pigarian avatar May 05 '23 19:05 Pigarian

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

JJJJaymax avatar Jun 06 '23 11:06 JJJJaymax

I had slow transcriptions depending which settings I was using. These settings improved it for me:

        settings["LLVM_LTO"] = "YES"
        settings["OTHER_CFLAGS"] = ["-O3", "-DNDEBUG"]

23inhouse avatar Jun 16 '23 09:06 23inhouse

@23inhouse Where did you add these settings?

shaynemei avatar Jul 17 '23 17:07 shaynemei

@shaynemei I use tuist to manage xcode settings but if you search for LLVM_LTO and DNDEBUG in this repo or in google you can see where they are set in the xcodeproj files. There is a way to set those in the xcode gui. You could also set them directly in the files. I'm sorry I can't remember the exact steps.

23inhouse avatar Jul 21 '23 12:07 23inhouse

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

@JJJJaymax Did you find a solution to your issue, I also want to use whisper.cpp in my qt 6.67 application.

kittechAI avatar Jun 04 '24 10:06 kittechAI

Did you build with CUDA enabled?

ulatekh avatar Jun 04 '24 21:06 ulatekh

Did you build with CUDA enabled?

No. I need to run it on a CPU instead. the computer that I want it to run on does not have GPU support. Can you please help

kittechAI avatar Jun 10 '24 19:06 kittechAI

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

same problem.

KernelInterrupt avatar Mar 03 '25 11:03 KernelInterrupt

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

I solve it by setting CMAKE_BUILD_TYPE to "release".

KernelInterrupt avatar Mar 03 '25 16:03 KernelInterrupt

I eventually encountered the same issue. Newer versions are extremely cpu intensive and useless in my case. I used git checkout v1.5.1 this is a fast version and works for me on my Ryzen 5 7600.

simonsays-techtalk avatar Jun 14 '25 08:06 simonsays-techtalk