ggml Vulkan Stable Diffusion Operators

I implemented the Operators necessary for stable-diffusion.cpp to run using Vulkan. The corresponding PR is https://github.com/leejet/stable-diffusion.cpp/pull/291

Image generation works now, but I want add some minor stuff for LORA/TAESD (https://github.com/leejet/stable-diffusion.cpp/pull/291#issuecomment-2256572656), run further tests to make sure everything works, and maybe do some performance checks and optimizations before setting this to ready.

Jul 30 '24 05:07 0cc4m

@ggerganov I fixed two bugs while implementing this (https://github.com/ggerganov/ggml/pull/904/commits/fd01e5d07f1c5e0d022aa239467d23a61c7cff45 and https://github.com/ggerganov/ggml/pull/904/commits/ecc1f514bd795e6acaea13bfce85078afdc2e112), can I just cherry-pick those into a llama.cpp PR or would that cause issues with the repo synchronization?

Edit: Also https://github.com/ggerganov/ggml/pull/904/commits/577b13257250af931dac307af0924c810c202953

Jul 30 '24 05:07 0cc4m

It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later

Jul 30 '24 07:07 ggerganov

It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later

It doesn't seem to cause any significant issue on llama.cpp, so I'll wait for a sync unless someone opens an issue that would be fixed by this.

Jul 30 '24 08:07 0cc4m

Btw, does this fix the following tests:

https://github.com/ggerganov/llama.cpp/pull/8613#issuecomment-2241766951

Jul 30 '24 08:07 ggerganov

Btw, does this fix the following tests:

ggerganov/llama.cpp#8613 (comment)

It should, yes. When refactoring the shader code into files I set a preprocessor value incorrectly, which caused matmuls to fail when k is not divisible by 8.

Jul 30 '24 08:07 0cc4m

I think I caught all of the major issues now, stable-diffusion.cpp works with Vulkan with these changes on AMD and Nvidia.

Jul 31 '24 07:07 0cc4m

It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:

ggml_extend.hpp:939  - clip compute buffer size: 1.40 MB(VRAM)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

Jul 31 '24 11:07 SkutteOleg

It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:
ggml_extend.hpp:939  - clip compute buffer size: 1.40 MB(VRAM)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

Please always add what model you are running and what command you called it with.

Jul 31 '24 11:07 0cc4m

My bad. Didn't have time to test thoroughly at the time.

After some further testing I've determined the source of the problem to be quantization. Here is an example command:

sd.exe -p "A lovely cat" -m "v1-5-pruned-emaonly.ckpt" --type q8_0

Log:

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1660 SUPER (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
[INFO ] stable-diffusion.cpp:176  - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt'
[INFO ] model.cpp:744  - load D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt using checkpoint format
[INFO ] stable-diffusion.cpp:199  - Stable Diffusion 1.x
[INFO ] stable-diffusion.cpp:205  - Stable Diffusion weight type: q8_0
[INFO ] stable-diffusion.cpp:427  - total params memory size = 1618.48MB (VRAM 1618.48MB, RAM 0.00MB): clip 125.20MB(VRAM), unet 1398.81MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:431  - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt' completed, taking 19.80s
[INFO ] stable-diffusion.cpp:451  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:569  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1028 - apply_loras completed, taking 0.00s
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

Jul 31 '24 14:07 SkutteOleg

@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?

Aug 01 '24 06:08 0cc4m

I forgot to check img2img, GGML_OP_PAD was missing for that. I added it now.

Aug 01 '24 08:08 0cc4m

@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?

Works great, thank you! Also the issue I was having where 1024x1024 would produce broken outputs is gone. I also was having an issue where Vulkan was looking too blotchy and noisy compared to CUDA12 and it is fixed as well to the point where CUDA12 images look noisier to me now.

All my use cases are covered, great job!

Aug 01 '24 09:08 SkutteOleg

Nice, should we proceed with merge?

I will add LEAKY_RELU (https://github.com/leejet/stable-diffusion.cpp/pull/291#issuecomment-2266991167) in the next few hours, then we can merge.

Aug 04 '24 09:08 0cc4m