whisper.cpp doesn't run
zluda_trace logs (tarball/zip file)
Description
Whisper.cpp when compiled with cuda support doesn't work with zluda. It says the CUDA driver version is insufficient for CUDA runtime version but it works without zluda on an nvidia card
whisper_init_from_file_with_params_no_state: loading model from '/home/asdf/Downloads/whisper.cpp-master/models/ggml-base.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_cuda_init: failed to initialize CUDA: CUDA driver version is insufficient for CUDA runtime version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 3
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: Vulkan0 total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 17.24 MB
whisper_init_state: compute buffer (encode) = 85.88 MB
whisper_init_state: compute buffer (cross) = 4.66 MB
whisper_init_state: compute buffer (decode) = 97.29 MB
Steps to reproduce
from whisper.cpp:
First clone the repository:
git clone https://github.com/ggml-org/whisper.cpp.git
Navigate into the directory:
cd whisper.cpp
Then, download one of the Whisper models converted in ggml format. For example:
sh ./models/download-ggml-model.sh base.en
Now build the whisper-cli example and transcribe an audio file like this:
# build the project
cmake --fresh -B build -D WHISPER_FFMPEG=yes -DGGML_CUDA=1 -DGGML_VULKAN=1
cmake --build build -j --config Release
# transcribe an audio file
./build/bin/whisper-cli -f samples/jfk.wav
ZLUDA version
5
Operating System
endeavour os rolling
GPU
AMD Radeon RX Vega 7
Hi, ZLUDA will not work for your setup because ZLUDA does not support GPUs older than RDNA. I'm leaving this open because I want to eventually look into whisper anyway
Hi, ZLUDA will not work for your setup because ZLUDA does not support GPUs older than RDNA. I'm leaving this open because I want to eventually look into whisper anyway
Could I add support or is it a hardware limitation?
ZLUDA supports AMD Radeon RX 5000 series and newer GPUs (both desktop and integrated). Older consumer GPUs (Polaris, Vega, etc.) and server‑class GPUs are not supported; these architectures differ significantly from recent desktop GPUs and would require substantial engineering effort
It would be better to make whisper.cpp run on HIP since the library it uses, GGML, supports it. It would be easier than making CUDA code run on pre RDNA GPUs. https://en.wikipedia.org/wiki/RDNA_(microarchitecture) basically the changes AMD made for RDNA, made it more similar to Nvidia GPUs which helps ZLUDA a lot, before RDNA, AMD GPUs were rather different and ZLUDA would be much harder to develop
It would be better to make whisper.cpp run on HIP since the library it uses, GGML, supports it. It would be easier than making CUDA code run on pre RDNA GPUs. https://en.wikipedia.org/wiki/RDNA_(microarchitecture) basically the changes AMD made for RDNA, made it more similar to Nvidia GPUs which helps ZLUDA a lot, before RDNA, AMD GPUs were rather different and ZLUDA would be much harder to develop
Also opencl