llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Vulkan Optimizations and Fixes

Open 0cc4m opened this issue 1 year ago • 1 comments

I have implemented a number of Vulkan optimizations and fixes:

  • Implement REPEAT operator shader to fix low performance of Vulkan copy-based implementation
  • Use GLSL FMA instruction where possible
  • Add GGML_VULKAN_PERF option to get approximate performance data about a running model
  • Rework and fix Vulkan Descriptor Set handling, this improves performance in my tests on AMD RADV
  • Fix validation error on float32 concat f16 shader

I will keep this on draft while I check a few more things, but feel free to test and benchmark. Don't expect a huge difference.


0cc4m avatar Aug 09 '24 20:08 0cc4m

I missed a validation issue in #8943, but the fix is now in this branch. I think this should be ready for a review and then merge.

0cc4m avatar Aug 11 '24 09:08 0cc4m

@ggerganov @slaren Can one of you review the non-Vulkan parts of this PR and approve if that's fine?

0cc4m avatar Aug 14 '24 14:08 0cc4m