whisper.cpp
whisper.cpp copied to clipboard
How to use with a AMD GPU?
I try to build with Vulkan support . but GPU usage is still very low . I already install rocm.
OS: ubuntu 22.04 ROCm: 6.3.2
My test with Vulkan and AMD iGPU (7840) on Windows works great.
I managed to run with patched hip 6.2.4, but its a pitty AMD is not officially supporting this...
I am experiencing this same issue. Loading the large-v3-turbo model hits the CPU only and gpu is effectively idle, even though it is detected and shaders compiled upon starting the application. Disabling threads or processors by setting them to 0 just crashes the application (whisper_server) upon receiving the first request.
OS: windows 11
For the last few years I have been running a third-party compiled version of whisper.cpp from the whisper-desktop project. It was old enough that it still used main.exe. It reliably used my AMD integrated GPU and ran very quickly (7-8X with medium.en). However I found that this old binary from 2022 does not work with quantized models, and the new release binary from the actions page of this project does not work with my AMD GPU. Unfortunately, my Linux machine doesn't have an equivalent GPU. It would be really nice to get this working with modern whisper.cpp on Windows 11. Is this a me problem? Can I somehow build it myself? Annoyingly, I know much more about how to do that on Linux than I do on Windows. Is this just a sign from the universe that I need to build an ML machine that runs Linux?
I can say that it is not working on integrated AMD graphics with vulkan. Even when i run whisper.cpp as sudo and i disable the nvidia card using nvidia-smi, whisper.cpp still fails to use the integrated amd graphics even though it is recognized as a vulkan device, here it is on arch linux:
[asdf@asdf-82ju whisper.cpp-master]$ sudo nvidia-smi drain -p 0000:01:00.0 -m 1
Successfully set GPU 00000000:01:00.0 drain state to: draining.
[asdf@asdf-82ju whisper.cpp-master]$ sudo ./build/bin/whisper-cli -f /run/media/asdf/ad44bde3-3115-40c6-b841-c5571894fe8d/asdf/test.wav -m /home/asdf/Downloads/whisper.cpp-master/models/ggml-base.bin --language es
whisper_init_from_file_with_params_no_state: loading model from '/home/asdf/Downloads/whisper.cpp-master/models/ggml-base.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.28 MB
whisper_init_state: compute buffer (encode) = 85.88 MB
whisper_init_state: compute buffer (cross) = 4.66 MB
whisper_init_state: compute buffer (decode) = 96.37 MB
system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | REPACK = 1 |
main: processing '/run/media/asdf/ad44bde3-3115-40c6-b841-c5571894fe8d/asdf/test.wav' (10005149 samples, 625.3 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = es, task = transcribe, timestamps = 1 ...