stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

optimize ggml_ext_chunk

Open leejet opened this issue 1 week ago • 5 comments

leejet avatar Dec 12 '25 17:12 leejet

0835e5c22727981947eda0f6cfaf16b96b3aed25 broke sd1.5:

master-408 0835e5c
teste_1765561010 teste_1765560820

wbruna avatar Dec 12 '25 17:12 wbruna

@wbruna, Oh, you're right, I was only looking at the speed.

stduhpf avatar Dec 12 '25 17:12 stduhpf

0835e5c broke sd1.5:

Same on SDXL.

daniandtheweb avatar Dec 12 '25 18:12 daniandtheweb

Testing each version on SD1.5: when compared with 59ebdf0, #1079 seems almost as fast on Vulkan, and around 9% slower on ROCm. The ggml_ext_chunk suggested above is ~3-4% slower on both:

version vulkan rocm
59ebdf0 2.65s/it 2.34s/it
347710f (and current master) 3.65s/it 3.44s/it
ggml_ext_chunk above 2.75s/it 2.41s/it
#1079 2.69s/it 2.54s/it

wbruna avatar Dec 12 '25 20:12 wbruna

0835e5c broke sd1.5:

master-408 0835e5c teste_1765561010 teste_1765560820

It looks like the implementations of the CUDA backend and the Vulkan backend are a bit different. I was able to reproduce it with the Vulkan backend as well, but everything works fine with the CUDA backend.

leejet avatar Dec 13 '25 05:12 leejet