llama.cpp
llama.cpp copied to clipboard
Vulkan Optimizations and Fixes
I have implemented a number of Vulkan optimizations and fixes:
- Implement REPEAT operator shader to fix low performance of Vulkan copy-based implementation
- Use GLSL FMA instruction where possible
- Add GGML_VULKAN_PERF option to get approximate performance data about a running model
- Rework and fix Vulkan Descriptor Set handling, this improves performance in my tests on AMD RADV
- Fix validation error on float32 concat f16 shader
I will keep this on draft while I check a few more things, but feel free to test and benchmark. Don't expect a huge difference.
- [x] I have read the contributing guidelines
- Self-reported review complexity:
- [ ] Low
- [x] Medium
- [ ] High
I missed a validation issue in #8943, but the fix is now in this branch. I think this should be ready for a review and then merge.
@ggerganov @slaren Can one of you review the non-Vulkan parts of this PR and approve if that's fine?