llama.cpp
llama.cpp copied to clipboard
vulkan backend should use VK_KHR_cooperative_matrix if possible
Feature Description
Vulkan has an extension to allow implementations to expose matrix multipliers with certain properties. (VK_KHR_cooperative_matrix)
This seems like it should be useful to implement llama.cpp matmul at least, I'm interested in anyone sees a reason why this isn't useful, I'll start playing around with it myself but I'm not at full speed on how to hook this into llama.cpp yet.
That is something I plan to do eventually. The thing is that the GPUs that most benefit from this are Nvidia GPUs, which are already well-served with the CUDA backend. I have considered doing this for Intel GPUs, cause in theory they do support it and might benefit a lot from it, but my basic incorrect attempt at using the extension has only shown that at least on Linux with the Mesa driver, Intel still segfaults when trying to load shaders with it.
For that reason I have not pursued it further, for now. If you wanna look at the code and try to fix it, go ahead. The shader does run on Nvidia, but doen't output the correct results yet.
I'd like to ship a llama.cpp in a distro that can't really build CUDA so I'd like to use the VK or CL backends and try to make them as optimal as possible on Mesa Vulkan implementation, so I'll see if I can work this out.
Sure, let me know if you need help. You could also reach me on Discord for faster replies.
This issue was closed because it has been inactive for 14 days since being marked as stale.