stable-diffusion.cpp sync: sync with latest ggml

sync with latest ggml and integrate the amazing stable-diffusion.cpp to a standard Android APP for purpose of text-2-image on Android phone.

validated on x86-Linux and Android phone equipped with Snapdragon 8Gen3&8Elite.

713992135

btw, I suggest that all internal and public non-static functions can be added with prefix "sd_".

May 05 '25 07:05 jeffzhou2000

@zhouwg Thanks for this and for your work on QNN. Did you try to run inference on this backend? And how is the performance so far compared to CPU?

I'm working on something similar for Android, Local-Diffusion, and I'm looking forward to adding the QNN backend once available

May 05 '25 08:05 rmatif

thanks for your attention with ggml-hexagon(ggml-qnn) for llama.cpp. currently the StableDiffusion inference via Hexagon-cDSP is not supported on Android phone:

666295660 .

Your Local-Diffusion project seems very interesting and powerful. the purpose of integrated StableDiffusion.cpp to my on-device AI research project recently is try to fix an opening issue:https://github.com/kantv-ai/kantv/issues/301. all efforts with that issue can be seen in the next commit in that research project.

I submitted ggml-hexagon/ggml-qnn PR in upstream llama.cpp community on 03/11/2025, unfortunately, it seems that there are no positive feedback with that PR and I don't know why. I also hope ggml-hexagon backend could be available in the upstream llama.cpp.

May 05 '25 09:05 jeffzhou2000

@zhouwg I saw two WIP QNN backends on the llama.cpp repo and didn't understand why it was like that.

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

May 05 '25 10:05 rmatif

@zhouwg I saw two WIP QNN backends on the llama.cpp repo and didn't understand why it was like that.

this is a real good question. another WIP QNN backend is a hard-forked candidate PR from a Chinese C++ programmer base on my original PR on 04/26/2024.

pls refer to some tech docs/posts in project ggml-hexagon to understand more tech details about ggml-hexagon: https://github.com/zhouwg/ggml-hexagon/discussions

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

stable-diffusion supportive on cDSP is not difficult because I'm busy working on that opening issue recently. I'll add stable-diffusion supportive on cDSP later (I guess the performance should be poorer than the default ggml backend because of some tech&non-tech factors) after I merge the PR of integrate stablediffusion.cpp for realtime text-2-image in online-TV scenario on Android phone in that research project.

May 05 '25 10:05 jeffzhou2000

Isn't this missing the actual ggml sync? Something like in the screenshot should be shown in diff, right? Screen Shot 2025-05-05 at 15 25 14

May 05 '25 13:05 idostyle

thanks for your review and correction! I'll refine it accordingly.

May 06 '25 02:05 jeffzhou2000

And for stable-diffusion support on cDSP, what is the current limitation that doesn't make inference possible? I saw that you already have a matmul kernel so theoretically it should be possible to at least run these ops. Is it due to the 2GB memory pool max?

you are correct!

I added stable-diffusion inference on cDSP in this PR:https://github.com/kantv-ai/kantv/pull/307 unfortunately, stable-diffusion inference on cDSP cann't works correctly as expected because the backed-scheduler feature in llama.cpp is not supported in stable-diffusion.cpp currently, pls refer to:https://github.com/leejet/stable-diffusion.cpp/issues/671

May 06 '25 03:05 jeffzhou2000