tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Backend][Relax] Add Intel GNA backend for NPU support

Open Aristide021 opened this issue 4 months ago • 5 comments

Intel GNA (Gaussian Neural Accelerator) backend for TVM Relax, designed as a foundation for Intel NPU support. While GNA hardware is present in Intel Core Ultra processors, this backend serves as a stepping stone toward Intel's current NPU path with OpenVINO runtime integration.

Features:

  • Pattern-based graph partitioning for GNA/NPU-compatible operations
  • JSON serialization approach enabling seamless NPU migration
  • Software emulation mode for testing without dedicated hardware
  • Support for dense/linear, 1D convolution, and ReLU operations
  • Automatic shape and dtype extraction for optimization
  • Comprehensive test coverage with CI integration

Supported operations:

  • Dense/Linear layers (relax.matmul)
  • 1D Convolution (relax.nn.conv1d)
  • ReLU activation (relax.nn.relu)

This implementation provides a clean, minimal pattern for backend development while preparing the foundation for Intel's recommended NPU acceleration path through TVM's compilation pipeline.

Aristide021 avatar Aug 09 '25 20:08 Aristide021

@Aristide021 Thanks for the PR! A couple of points and questions:

  1. Status of GNA vs NPU
    • The upstream GNA repo is archived and marked as not under active management. The OpenVINO docs also note that GNA is being discontinued and recommend using Intel's NPU as the low-power offload path on newer processors. Given that, would it make sense to position this backend as a stepping stone toward NPU (and/or clarify the long-term maintenance plan in the README/code comments)?
    • https://github.com/intel/gna
    • https://docs.openvino.ai/2023.3/openvino_docs_OV_UG_supported_plugins_GNA.html
  2. CI & Software Emulation Mode
    • According to the OpenVINO docs, GNA plugin supports Software Emulation Mode (CPU fallback) when GNA HW isn't present. If we enable that in tests, we could run E2E coverage in our CI.

I also think this backend can serve as a very good example for codegen in Relax. It shows a clean and minimal pattern: partitioning with basic ops, handing off to JSON, and keeping the implementation relatively lightweight. Adding a short HOWTO or developer note ("Writing a minimal Relax backend") that references this code could be very helpful for the community.

cc @tqchen @Hzfengsy @cbalint13

mshr-h avatar Aug 21 '25 13:08 mshr-h

@Aristide021 Thanks for the PR! A couple of points and questions:

  1. Status of GNA vs NPU

    • The upstream GNA repo is archived and marked as not under active management. The OpenVINO docs also note that GNA is being discontinued and recommend using Intel's NPU as the low-power offload path on newer processors. Given that, would it make sense to position this backend as a stepping stone toward NPU (and/or clarify the long-term maintenance plan in the README/code comments)?
    • https://github.com/intel/gna
    • https://docs.openvino.ai/2023.3/openvino_docs_OV_UG_supported_plugins_GNA.html
  2. CI & Software Emulation Mode

    • According to the OpenVINO docs, GNA plugin supports Software Emulation Mode (CPU fallback) when GNA HW isn't present. If we enable that in tests, we could run E2E coverage in our CI.

I also think this backend can serve as a very good example for codegen in Relax. It shows a clean and minimal pattern: partitioning with basic ops, handing off to JSON, and keeping the implementation relatively lightweight. Adding a short HOWTO or developer note ("Writing a minimal Relax backend") that references this code could be very helpful for the community.

cc @tqchen @Hzfengsy @cbalint13

Thanks for the review and the excellent points! You're correct about GNA being archived. I designed this backend as a stepping stone toward NPU support with OpenVINO runtime integration in mind. The JSON serialization approach should make the transition to Intel's current NPU path relatively straightforward.

For the CI integration with Software Emulation Mode, I think that's a great suggestion. I can add CPU fallback support to enable E2E testing without requiring actual GNA hardware.

I'd also be happy to add documentation, positioning this as a foundation for NPU backends, and include a developer guide if that would be helpful for the community.

I'll go ahead and update the PR description to clarify the NPU migration path. My next step will be to add CPU emulation support for testing. Please let me know if you have any other suggestions.

Aristide021 avatar Aug 22 '25 18:08 Aristide021

Thanks for the contribution, given GNA is archived, it perhaps does not make sense to maintain it in the main tree, adding ci will also add extra overhead here. However, i agree that having generic tutorials for BYOC NPU would be useful, if we can have something that support a current NPU that would be great

tqchen avatar Aug 24 '25 16:08 tqchen

Thanks for the contribution, given GNA is archived, it perhaps does not make sense to maintain it in the main tree, adding ci will also add extra overhead here. However, i agree that having generic tutorials for BYOC NPU would be useful, if we can have something that support a current NPU that would be great

I'd be happy to refactor this into a generic NPU tutorial targeting Intel's current NPU plugin. Should this live in the tutorials section or as a contrib module? I can adapt the JSON architecture for educational purposes.

Aristide021 avatar Aug 24 '25 17:08 Aristide021

i think starting as contrib is fine, and we can have a tutorial explaination point to the code

tqchen avatar Aug 24 '25 19:08 tqchen