stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard
slow ggml_vec_dot_f16 operator on Android
Hi, @leejet
I compiled this project with clblast support and run sd on my Android phone. It runs successfully, however it's quite slow, about 70s per iter. And I profile it with perf, convert the output to flame graph and I found that the ggml_vec_dot_f16 accounts for over 80% of the runtime. Does this op support the adreno gpu acceleration? What's the reason behind this?
Thanks a lot~
I think it would be better to support Vulkan backend for acceleration on Android devices, as ggml currently lacks good support for OpenCL (it is even considered obsolete). Unfortunately, I don't know much about Vulkan to implement the kernels of the operations (I started watching some videos a few weeks ago because I want to stop using OpenGL).