llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

vulkan backend should use VK_KHR_cooperative_matrix if possible

Open airlied opened this issue 11 months ago • 3 comments

Feature Description

Vulkan has an extension to allow implementations to expose matrix multipliers with certain properties. (VK_KHR_cooperative_matrix)

This seems like it should be useful to implement llama.cpp matmul at least, I'm interested in anyone sees a reason why this isn't useful, I'll start playing around with it myself but I'm not at full speed on how to hook this into llama.cpp yet.

airlied avatar Mar 15 '24 06:03 airlied

That is something I plan to do eventually. The thing is that the GPUs that most benefit from this are Nvidia GPUs, which are already well-served with the CUDA backend. I have considered doing this for Intel GPUs, cause in theory they do support it and might benefit a lot from it, but my basic incorrect attempt at using the extension has only shown that at least on Linux with the Mesa driver, Intel still segfaults when trying to load shaders with it.

For that reason I have not pursued it further, for now. If you wanna look at the code and try to fix it, go ahead. The shader does run on Nvidia, but doen't output the correct results yet.

0cc4m avatar Mar 17 '24 10:03 0cc4m

I'd like to ship a llama.cpp in a distro that can't really build CUDA so I'd like to use the VK or CL backends and try to make them as optimal as possible on Mesa Vulkan implementation, so I'll see if I can work this out.

airlied avatar Mar 17 '24 21:03 airlied

Sure, let me know if you need help. You could also reach me on Discord for faster replies.

0cc4m avatar Mar 21 '24 16:03 0cc4m

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar May 05 '24 01:05 github-actions[bot]