llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Anyone built with vulkan yet?

Open Ph0rk0z opened this issue 1 year ago • 12 comments

I updated llama.cpp and compiled with vulkan=1 but I get an error about compiling with -fPIC enabled.

Ph0rk0z avatar Jan 28 '24 19:01 Ph0rk0z

@Ph0rk0z give it a shot again with v0.2.36 the cmake build was fixed in https://github.com/ggerganov/llama.cpp/pull/5182

abetlen avatar Jan 29 '24 16:01 abetlen

llama-cpp-python worked fine with Vulkan last night (on Linux) when I built it with my PR https://github.com/ggerganov/llama.cpp/pull/5182.

netrunnereve avatar Jan 29 '24 20:01 netrunnereve

It built but I have no way to select what GPU to use.

Ph0rk0z avatar Jan 30 '24 01:01 Ph0rk0z

I got it to compile and install, and I can generate text much faster with it compared to clblast, but sadly the save state doesn't work, I will probably open a new issue about it tomorrow.

stduhpf avatar Feb 01 '24 01:02 stduhpf

image

I'm not real sure what I'm doing wrong with it personally, but I did manage to get it to run and compile in docker. I am assuming it is selecting the wrong GPU but I am not sure how to change it. I tried changing main_gpu from 0 to 1 when initializing Llama, but got the same result. My laptop has an AMD GPU and an nvidia GPU, needs to run on the nvidia one.

Any ideas?

Josh-XT avatar Feb 01 '24 16:02 Josh-XT

@Josh-XT Have you tried with lower values of n_ctx? 32768 is a lot. Try something more reasonable like 1024.

stduhpf avatar Feb 01 '24 16:02 stduhpf

@Josh-XT Have you tried with lower values of n_ctx? 32768 is a lot. Try something more reasonable like 1024.

I'll give that a try. I actually just had it set to 0.

Josh-XT avatar Feb 01 '24 17:02 Josh-XT

@Josh-XT Also, probably related to your GPU selection issue: https://github.com/ggerganov/llama.cpp/issues/5259#issuecomment-1921709786

stduhpf avatar Feb 01 '24 18:02 stduhpf

You can choose GPU with GGML_VULKAN_DEVICE= environment variable. 0 is first GPU or CPU depending on setup.

I tried it out on two machines, one with accelerated Nvidia drivers, one without.

Vulkan recognizes the proprietary Nvidia driver. Performance is right in between nvidia's cuBLAS (using all GPU layers) and openBLAS/cuBLAS` without GPU support.

Proprietary Nvidia drivers:

  • cuBLAS with all graphics layers (-ngl 99): 33 tokens/sec.
  • Proprietary Nvidia Vulkan with GPU: 22 tokens/sec.
  • Proprietary Nvidia cuBLAS without -ngl 99: 14 tokens/sec.

Open-source nouveau mesa drivers:

  • openBLAS (CPU-only): 13-14 tokens/sec.
  • Proprietary Nvidia Vulkan (CPU-only): 12 tokens/sec.
  • Plain compile (no options): 7 tokens/sec.

I'll keep using cuBLAS, but keep an eye on it. Performance will no doubt improve as all the debugging code and safety checks add significant CPU overhead at this stage of development.

Update: I'm aware there is a third, official "open-source" Nvidia driver on the Nvidia website that is supposed to work with Vulcan. But I'm on mobile so it's timing out on download.

themanyone avatar Feb 02 '24 05:02 themanyone

I have succesfully build llamacpp with vulkan, but cant yet build llama-python-cpp with vulkan. I always get core dump when loading the model. With llamacpp it works fines

userbox020 avatar Feb 16 '24 03:02 userbox020

@userbox020 are you using cmake to build llama.cpp?

abetlen avatar Feb 17 '24 05:02 abetlen

@userbox020 are you using cmake to build llama.cpp?

im using vulkan compile llama.cpp

userbox020 avatar May 03 '24 20:05 userbox020