[Feature] NPU support for backend apart from the existing cpu, vulkan, opencl etc.
Feature Summary
NPU support for the backend
Detailed Description
[Feature] NPU support for backend apart from the existing cpu, vulkan, opencl etc.
Alternatives you considered
No response
Additional context
No response
I suggest you open a request on the llama.cpp project, since ggml (our backend library) is developed primarily there.
https://github.com/FastFlowLM/FastFlowLM
As far as I know, they are the only ones with a well-optimized inference engine for NPUs (specifically XDNA2), and it’s impressive what they achieve within a TDP below 2 W. It makes me think that AMD should focus resources on ASICs and the software ecosystem for them.
Hey, would love to have a help out. I was working on a android app to have a local quantized sdxl as a model to run.
I tried wirh stable-diffusion.cpp, however due to it not having npu support it takes a huge amount of time. I am working on a exynos samsung processor, could you suggest any other alternative which i can use. Anyways thanks for the reply.
MNN has acceleration in NPU but only for the latest Snapdragon, I think. You should research Samsung's documentation if you plan to study how to implement i