Vulkan Stable Diffusion Operators
I implemented the Operators necessary for stable-diffusion.cpp to run using Vulkan. The corresponding PR is https://github.com/leejet/stable-diffusion.cpp/pull/291
Image generation works now, but I want add some minor stuff for LORA/TAESD (https://github.com/leejet/stable-diffusion.cpp/pull/291#issuecomment-2256572656), run further tests to make sure everything works, and maybe do some performance checks and optimizations before setting this to ready.
@ggerganov I fixed two bugs while implementing this (https://github.com/ggerganov/ggml/pull/904/commits/fd01e5d07f1c5e0d022aa239467d23a61c7cff45 and https://github.com/ggerganov/ggml/pull/904/commits/ecc1f514bd795e6acaea13bfce85078afdc2e112), can I just cherry-pick those into a llama.cpp PR or would that cause issues with the repo synchronization?
Edit: Also https://github.com/ggerganov/ggml/pull/904/commits/577b13257250af931dac307af0924c810c202953
It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later
It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later
It doesn't seem to cause any significant issue on llama.cpp, so I'll wait for a sync unless someone opens an issue that would be fixed by this.
Btw, does this fix the following tests:
https://github.com/ggerganov/llama.cpp/pull/8613#issuecomment-2241766951
Btw, does this fix the following tests:
It should, yes. When refactoring the shader code into files I set a preprocessor value incorrectly, which caused matmuls to fail when k is not divisible by 8.
I think I caught all of the major issues now, stable-diffusion.cpp works with Vulkan with these changes on AMD and Nvidia.
It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:
ggml_extend.hpp:939 - clip compute buffer size: 1.40 MB(VRAM)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed
It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:
ggml_extend.hpp:939 - clip compute buffer size: 1.40 MB(VRAM) D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed
Please always add what model you are running and what command you called it with.
My bad. Didn't have time to test thoroughly at the time.
After some further testing I've determined the source of the problem to be quantization. Here is an example command:
sd.exe -p "A lovely cat" -m "v1-5-pruned-emaonly.ckpt" --type q8_0
Log:
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1660 SUPER (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
[INFO ] stable-diffusion.cpp:176 - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt'
[INFO ] model.cpp:744 - load D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt using checkpoint format
[INFO ] stable-diffusion.cpp:199 - Stable Diffusion 1.x
[INFO ] stable-diffusion.cpp:205 - Stable Diffusion weight type: q8_0
[INFO ] stable-diffusion.cpp:427 - total params memory size = 1618.48MB (VRAM 1618.48MB, RAM 0.00MB): clip 125.20MB(VRAM), unet 1398.81MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:431 - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt' completed, taking 19.80s
[INFO ] stable-diffusion.cpp:451 - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:569 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1028 - apply_loras completed, taking 0.00s
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed
@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?
I forgot to check img2img, GGML_OP_PAD was missing for that. I added it now.
@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?
Works great, thank you! Also the issue I was having where 1024x1024 would produce broken outputs is gone. I also was having an issue where Vulkan was looking too blotchy and noisy compared to CUDA12 and it is fixed as well to the point where CUDA12 images look noisier to me now.
All my use cases are covered, great job!
Nice, should we proceed with merge?
I will add LEAKY_RELU (https://github.com/leejet/stable-diffusion.cpp/pull/291#issuecomment-2266991167) in the next few hours, then we can merge.