gallery icon indicating copy to clipboard operation
gallery copied to clipboard

[FEATURE] Add Qualcomm Hexagon NPU delegation for hardware acceleration

Open Saymond opened this issue 6 months ago • 3 comments

The application currently allows for model inference using the CPU and GPU. While this is great, modern Android flagship devices, particularly those with recent Snapdragon SoCs (System on a Chip), feature a very powerful and efficient Hexagon NPU (Neural Processing Unit) that is currently not being utilized by the app. This represents a missed opportunity for significant performance and efficiency gains, especially on high-end devices.

I would like to request the implementation of a new delegation option in the app to use the Qualcomm Hexagon NPU for model inference on supported Snapdragon processors. This would allow the app to offload AI workloads to this specialized hardware, resulting in much faster processing speeds and lower power consumption compared to using the CPU or even the GPU.

The current alternatives are using the CPU or GPU. While functional, they are not as optimized for sustained AI inference as a dedicated NPU. For complex models, this can lead to slower performance and higher battery drain, which is suboptimal for a mobile application.

Modern chipsets like the Snapdragon 8 Gen 2 and especially the Snapdragon 8 Gen 3 have incredibly capable NPUs. For instance, Qualcomm's own data indicates the Hexagon NPU in the Snapdragon 8 Gen 3 is up to 98% faster than its predecessor and offers a 40% improvement in performance per watt for AI tasks.

These NPUs are capable of running large language models at impressive speeds, with reports showing up to 20 tokens per second for on-device models. This level of performance would dramatically improve the user experience within the Edge Gallery app, making interactions smoother and more responsive.

Qualcomm provides the necessary resources for developers, such as the Qualcomm AI Engine Direct SDK and the Hexagon NPU SDK, to enable this functionality. Implementing this would make Edge Gallery one of the most performant on-device AI applications available and truly leverage the full capabilities of modern hardware.

Saymond avatar Jun 07 '25 23:06 Saymond

@Saymond Thank you for raising this feature request! We appreciate you taking the time to provide so much information. We will keep tracking this feature request and share it with the team. Thanks for the input!

dpknag avatar Jun 11 '25 18:06 dpknag

I agree! Adding NPU functionality could significantly boost the performance for those higher end SnapDragon SoC's. Definitely would love to see this in the future!

jakeboardman47 avatar Jun 15 '25 02:06 jakeboardman47

Agreed!

I use Layla AI for Android's NPU for offline LLMs.

It's much faster than the GPU regardless of using Vulcan or OpenCL.

Not only inference but loading the model is much quicker.

SuperPauly avatar Jul 24 '25 03:07 SuperPauly