ZLUDA whisper.cpp doesn't run

zluda_trace logs (tarball/zip file)

Description

Whisper.cpp when compiled with cuda support doesn't work with zluda. It says the CUDA driver version is insufficient for CUDA runtime version but it works without zluda on an nvidia card

whisper_init_from_file_with_params_no_state: loading model from '/home/asdf/Downloads/whisper.cpp-master/models/ggml-base.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_cuda_init: failed to initialize CUDA: CUDA driver version is insufficient for CUDA runtime version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3060 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 3
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      Vulkan0 total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   17.24 MB
whisper_init_state: compute buffer (encode) =   85.88 MB
whisper_init_state: compute buffer (cross)  =    4.66 MB
whisper_init_state: compute buffer (decode) =   97.29 MB

Steps to reproduce

from whisper.cpp:

First clone the repository:

git clone https://github.com/ggml-org/whisper.cpp.git

Navigate into the directory:

cd whisper.cpp

Then, download one of the Whisper models converted in ggml format. For example:

sh ./models/download-ggml-model.sh base.en

Now build the whisper-cli example and transcribe an audio file like this:

# build the project
cmake --fresh -B build -D WHISPER_FFMPEG=yes -DGGML_CUDA=1 -DGGML_VULKAN=1
cmake --build build -j --config Release

# transcribe an audio file
./build/bin/whisper-cli -f samples/jfk.wav

ZLUDA version

5

Operating System

endeavour os rolling

GPU

AMD Radeon RX Vega 7

Oct 05 '25 17:10 alosslessdev

Hi, ZLUDA will not work for your setup because ZLUDA does not support GPUs older than RDNA. I'm leaving this open because I want to eventually look into whisper anyway

Oct 10 '25 18:10 vosen

Hi, ZLUDA will not work for your setup because ZLUDA does not support GPUs older than RDNA. I'm leaving this open because I want to eventually look into whisper anyway

Could I add support or is it a hardware limitation?

Oct 10 '25 18:10 alosslessdev

ZLUDA supports AMD Radeon RX 5000 series and newer GPUs (both desktop and integrated). Older consumer GPUs (Polaris, Vega, etc.) and server‑class GPUs are not supported; these architectures differ significantly from recent desktop GPUs and would require substantial engineering effort

Oct 10 '25 18:10 vosen

It would be better to make whisper.cpp run on HIP since the library it uses, GGML, supports it. It would be easier than making CUDA code run on pre RDNA GPUs. https://en.wikipedia.org/wiki/RDNA_(microarchitecture) basically the changes AMD made for RDNA, made it more similar to Nvidia GPUs which helps ZLUDA a lot, before RDNA, AMD GPUs were rather different and ZLUDA would be much harder to develop

Dec 03 '25 04:12 Fran2789

It would be better to make whisper.cpp run on HIP since the library it uses, GGML, supports it. It would be easier than making CUDA code run on pre RDNA GPUs. https://en.wikipedia.org/wiki/RDNA_(microarchitecture) basically the changes AMD made for RDNA, made it more similar to Nvidia GPUs which helps ZLUDA a lot, before RDNA, AMD GPUs were rather different and ZLUDA would be much harder to develop

Also opencl

Dec 14 '25 05:12 alosslessdev