whisper.cpp Cuda doesn't use nvidia GPU

Cuda doesn't use nvidia GPU

Open thewh1teagle opened this issue 8 months ago • 0 comments

What happened?

When transcribing with cuda on Windows 11 and whisper 1.6.0 it uses the nvidia GPU only for few seconds and only for 1-2% and then it only uses the CPU / Intel GPU. As a result, transcribing 1 second of audio taks 30 seconds (openblas and cuda enabled)

log

Microsoft Windows [Version 10.0.22631.3672]
(c) Microsoft Corporation. All rights reserved.

C:\Users\משפחה\Desktop\test>set RUST_LOG=vibe=debug,whisper_rs=debug

C:\Users\משפחה\Desktop\test>%localappdata%\vibe\vibe.exe --model ggml-medium.bin --file short.wav --language english

C:\Users\משפחה\Desktop\test>[2024-06-05T05:29:50Z DEBUG vibe_desktop] Vibe App Running
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] webview version: 125.0.2535.79
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] CPU Features
    {"avx":{"enabled":true,"support":true},"avx2":{"enabled":true,"support":true},"f16c":{"enabled":true,"support":true},"fma":{"enabled":true,"support":true}}
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] COMMIT_HASH: bbbb0d102f86e71b7e4304a67798024a712ea63e
Transcribe... 🔄
[2024-06-05T05:29:50Z DEBUG vibe::model] Transcribe called with {
      "path": "short.wav",
      "model_path": "C:\\Users\\משפחה\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
      "lang": "en",
      "verbose": false,
      "n_threads": 4,
      "init_prompt": null,
      "temperature": 0.4,
      "translate": null
    }
[2024-06-05T05:29:50Z DEBUG vibe::audio] input is short.wav and output is C:\Users\F3F6~1\AppData\Local\Temp\.tmpTwC2sv.wav
[2024-06-05T05:29:50Z DEBUG vibe::audio::encoder] decoder channel layout is 0
[2024-06-05T05:29:50Z DEBUG vibe::audio] wav reader read from "C:\\Users\\F3F6~1\\AppData\\Local\\Temp\\.tmpTwC2sv.wav"
[2024-06-05T05:29:50Z DEBUG vibe::audio] parsing C:\Users\F3F6~1\AppData\Local\Temp\.tmpTwC2sv.wav
[2024-06-05T05:29:50Z DEBUG vibe::model] open model...
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\משפחה\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: use gpu    = 1
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: flash attn = 0
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: gpu_device = 0
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: dtw        = 0
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: loading model
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_vocab       = 51865
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_ctx   = 1500
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_state = 1024
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_head  = 16
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_audio_layer = 24
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_ctx    = 448
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_state  = 1024
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_head   = 16
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_text_layer  = 24
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_mels        = 80
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: ftype         = 1
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: qntvr         = 0
[2024-06-05T05:29:50Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: type          = 4 (medium)
[2024-06-05T05:29:51Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: adding 1608 extra tokens
[2024-06-05T05:29:51Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: n_langs       = 99
[2024-06-05T05:29:51Z INFO  whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-06-05T05:29:51Z INFO  whisper_rs::whisper_sys_log] whisper_model_load:    CUDA0 total size =  1533.14 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_model_load: model size    = 1533.14 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: kv self size  =  150.99 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: kv cross size =  150.99 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: kv pad  size  =    6.29 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (conv)   =   28.68 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (encode) =  594.22 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (cross)  =    7.85 MB
[2024-06-05T05:29:57Z INFO  whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (decode) =  142.09 MB
[2024-06-05T05:29:57Z DEBUG vibe::model] set language to Some("en")
[2024-06-05T05:29:58Z DEBUG vibe::model] setting temperature to 0.4
[2024-06-05T05:29:58Z DEBUG vibe::model] setting n threads to 4
[2024-06-05T05:29:58Z DEBUG vibe::model] set start time...
[2024-06-05T05:29:58Z DEBUG vibe::model] setting state full...
[2024-06-05T05:30:15Z DEBUG vibe::model] getting segments count...
[2024-06-05T05:30:15Z DEBUG vibe::model] found 1 segments
[2024-06-05T05:30:15Z DEBUG vibe::model] looping segments...

1
0:00:00,000 --> 0:00:01,600
Experience proves this.

Transcription completed in 25.1s ⏱️
Done ✅

GPU: Nvidia mx230 Laptop model: C340-15IM Windows 11 Cuda 12.5.0

Related: https://github.com/thewh1teagle/vibe/issues/102

Jun 05 '24 13:06 thewh1teagle

whisper.cpp whisper.cpp copied to clipboard

Cuda doesn't use nvidia GPU

What happened?

whisper.cpp
whisper.cpp copied to clipboard