whisper.cpp
whisper.cpp copied to clipboard
Cuda doesn't use nvidia GPU
What happened?
When transcribing with cuda on Windows 11 and whisper 1.6.0
it uses the nvidia GPU only for few seconds and only for 1-2% and then it only uses the CPU / Intel GPU.
As a result, transcribing 1 second of audio taks 30 seconds (openblas and cuda enabled)
log
Microsoft Windows [Version 10.0.22631.3672]
(c) Microsoft Corporation. All rights reserved.
C:\Users\משפחה\Desktop\test>set RUST_LOG=vibe=debug,whisper_rs=debug
C:\Users\משפחה\Desktop\test>%localappdata%\vibe\vibe.exe --model ggml-medium.bin --file short.wav --language english
C:\Users\משפחה\Desktop\test>[2024-06-05T05:29:50Z DEBUG vibe_desktop] Vibe App Running
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] webview version: 125.0.2535.79
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] CPU Features
{"avx":{"enabled":true,"support":true},"avx2":{"enabled":true,"support":true},"f16c":{"enabled":true,"support":true},"fma":{"enabled":true,"support":true}}
[2024-06-05T05:29:50Z DEBUG vibe_desktop::setup] COMMIT_HASH: bbbb0d102f86e71b7e4304a67798024a712ea63e
Transcribe... 🔄
[2024-06-05T05:29:50Z DEBUG vibe::model] Transcribe called with {
"path": "short.wav",
"model_path": "C:\\Users\\משפחה\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin",
"lang": "en",
"verbose": false,
"n_threads": 4,
"init_prompt": null,
"temperature": 0.4,
"translate": null
}
[2024-06-05T05:29:50Z DEBUG vibe::audio] input is short.wav and output is C:\Users\F3F6~1\AppData\Local\Temp\.tmpTwC2sv.wav
[2024-06-05T05:29:50Z DEBUG vibe::audio::encoder] decoder channel layout is 0
[2024-06-05T05:29:50Z DEBUG vibe::audio] wav reader read from "C:\\Users\\F3F6~1\\AppData\\Local\\Temp\\.tmpTwC2sv.wav"
[2024-06-05T05:29:50Z DEBUG vibe::audio] parsing C:\Users\F3F6~1\AppData\Local\Temp\.tmpTwC2sv.wav
[2024-06-05T05:29:50Z DEBUG vibe::model] open model...
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\משפחה\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin'
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: use gpu = 1
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: flash attn = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: gpu_device = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_init_with_params_no_state: dtw = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: loading model
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_vocab = 51865
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_ctx = 1500
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_state = 1024
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_head = 16
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_audio_layer = 24
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_ctx = 448
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_state = 1024
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_head = 16
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_text_layer = 24
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_mels = 80
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: ftype = 1
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: qntvr = 0
[2024-06-05T05:29:50Z INFO whisper_rs::whisper_sys_log] whisper_model_load: type = 4 (medium)
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_model_load: adding 1608 extra tokens
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_model_load: n_langs = 99
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-06-05T05:29:51Z INFO whisper_rs::whisper_sys_log] whisper_model_load: CUDA0 total size = 1533.14 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_model_load: model size = 1533.14 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_backend_init: using CUDA backend
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv self size = 150.99 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv cross size = 150.99 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: kv pad size = 6.29 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (conv) = 28.68 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (encode) = 594.22 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (cross) = 7.85 MB
[2024-06-05T05:29:57Z INFO whisper_rs::whisper_sys_log] whisper_init_state: compute buffer (decode) = 142.09 MB
[2024-06-05T05:29:57Z DEBUG vibe::model] set language to Some("en")
[2024-06-05T05:29:58Z DEBUG vibe::model] setting temperature to 0.4
[2024-06-05T05:29:58Z DEBUG vibe::model] setting n threads to 4
[2024-06-05T05:29:58Z DEBUG vibe::model] set start time...
[2024-06-05T05:29:58Z DEBUG vibe::model] setting state full...
[2024-06-05T05:30:15Z DEBUG vibe::model] getting segments count...
[2024-06-05T05:30:15Z DEBUG vibe::model] found 1 segments
[2024-06-05T05:30:15Z DEBUG vibe::model] looping segments...
1
0:00:00,000 --> 0:00:01,600
Experience proves this.
Transcription completed in 25.1s ⏱️
Done ✅
GPU: Nvidia mx230
Laptop model: C340-15IM
Windows 11
Cuda 12.5.0
Related: https://github.com/thewh1teagle/vibe/issues/102