llama-cpp-python Anyone built with vulkan yet?

I updated llama.cpp and compiled with vulkan=1 but I get an error about compiling with -fPIC enabled.

Jan 28 '24 19:01 Ph0rk0z

@Ph0rk0z give it a shot again with v0.2.36 the cmake build was fixed in https://github.com/ggerganov/llama.cpp/pull/5182

Jan 29 '24 16:01 abetlen

llama-cpp-python worked fine with Vulkan last night (on Linux) when I built it with my PR https://github.com/ggerganov/llama.cpp/pull/5182.

Jan 29 '24 20:01 netrunnereve

It built but I have no way to select what GPU to use.

Jan 30 '24 01:01 Ph0rk0z

I got it to compile and install, and I can generate text much faster with it compared to clblast, but sadly the save state doesn't work, I will probably open a new issue about it tomorrow.

Feb 01 '24 01:02 stduhpf

I'm not real sure what I'm doing wrong with it personally, but I did manage to get it to run and compile in docker. I am assuming it is selecting the wrong GPU but I am not sure how to change it. I tried changing main_gpu from 0 to 1 when initializing Llama, but got the same result. My laptop has an AMD GPU and an nvidia GPU, needs to run on the nvidia one.

Any ideas?

Feb 01 '24 16:02 Josh-XT

@Josh-XT Have you tried with lower values of n_ctx? 32768 is a lot. Try something more reasonable like 1024.

Feb 01 '24 16:02 stduhpf

@Josh-XT Have you tried with lower values of n_ctx? 32768 is a lot. Try something more reasonable like 1024.

I'll give that a try. I actually just had it set to 0.

Feb 01 '24 17:02 Josh-XT

@Josh-XT Also, probably related to your GPU selection issue: https://github.com/ggerganov/llama.cpp/issues/5259#issuecomment-1921709786

Feb 01 '24 18:02 stduhpf

You can choose GPU with GGML_VULKAN_DEVICE= environment variable. 0 is first GPU or CPU depending on setup.

I tried it out on two machines, one with accelerated Nvidia drivers, one without.

Vulkan recognizes the proprietary Nvidia driver. Performance is right in between nvidia's cuBLAS (using all GPU layers) and openBLAS/cuBLAS` without GPU support.

Proprietary Nvidia drivers:

cuBLAS with all graphics layers (-ngl 99): 33 tokens/sec.
Proprietary Nvidia Vulkan with GPU: 22 tokens/sec.
Proprietary Nvidia cuBLAS without -ngl 99: 14 tokens/sec.

Open-source nouveau mesa drivers:

openBLAS (CPU-only): 13-14 tokens/sec.
Proprietary Nvidia Vulkan (CPU-only): 12 tokens/sec.
Plain compile (no options): 7 tokens/sec.

I'll keep using cuBLAS, but keep an eye on it. Performance will no doubt improve as all the debugging code and safety checks add significant CPU overhead at this stage of development.

Update: I'm aware there is a third, official "open-source" Nvidia driver on the Nvidia website that is supposed to work with Vulcan. But I'm on mobile so it's timing out on download.

Feb 02 '24 05:02 themanyone

I have succesfully build llamacpp with vulkan, but cant yet build llama-python-cpp with vulkan. I always get core dump when loading the model. With llamacpp it works fines

Feb 16 '24 03:02 userbox020

@userbox020 are you using cmake to build llama.cpp?

Feb 17 '24 05:02 abetlen

@userbox020 are you using cmake to build llama.cpp?

im using vulkan compile llama.cpp

May 03 '24 20:05 userbox020

llama-cpp-python llama-cpp-python copied to clipboard

Anyone built with vulkan yet?

llama-cpp-python
llama-cpp-python copied to clipboard