stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard
SYCL Backend broken for Intel GPUs (GGML issue?)
This is on Ubuntu 24.04 LTS, with all the drivers and oneAPI toolkits updated to their latest available versions as of writing. When compiled from source following the instructions on this github, I get the following error:
[INFO ] stable-diffusion.cpp:1199 - apply_loras completed, taking 0.00s
No kernel named _ZTSZL13get_rows_syclILi32ELi2EXadL_ZL15dequantize_q4_0PKvliRN4sycl3_V13vecINS3_6detail9half_impl4halfELi2EEEEEEvR25ggml_backend_sycl_contextPK11ggml_tensorSE_PSC_S1_PKiPfPNS3_5queueEEUlNS3_7nd_itemILi3EEEE_ was foundException caught at file:/home/max/sd.cpp/stable-diffusion.cpp/ggml/src/ggml-sycl/common.cpp, line:102
I read somewhere in another issue regarding SYCL that maybe it'd be a good idea to try out a newer version of GGML, so I did. I copied the ggml folder from my llama.cpp folder (master branch at time of writing), and recompiled.
But sadly, that too gives an error around the same time:
[DEBUG] ggml_extend.hpp:1075 - clip params backend buffer size = 66.62 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1075 - unet params backend buffer size = 1272.89 MB(VRAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1075 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:413 - loading weights
[DEBUG] model.cpp:1645 - loading tensors from SomethingV2_2.safetensors
/home/max/sd.cpp/stable-diffusion.cpp/ggml/src/ggml.c:6299: GGML_ASSERT(result == nrows * row_size) failed
Is this a known issue?
Try updating the ggml types in stabled-diffusion.h: https://github.com/leejet/stable-diffusion.cpp/pull/509/files#diff-76ca3df0d9402626563aea06283bbc5264e1b46d596312a623ea93ed744f2897R98 I had a similar issue after updating GGML on Vulkan and CPU backends, and this fixed it.
Thank you for the quick response!
I tried out your repo, compiled as per usual, but unfortunately it results in the same errors as the first one I pasted, the "No kernel for..."
I tried out several quant methods, namely f16, q8_0, q4_0 but none of those three worked; all printed a similar error about not having a kernel for dequantize_q...
this looks like a regression somewhere, 1c168d98a5a47aaf6d9b2c7f3a23e3166c59a6ec works
19d876ee300a055629926ff836489901f734f2b7 - seems working