onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[QNN-EP] Add MatMulNBits translation for GPU

Open quic-tirupath opened this issue 2 months ago • 10 comments

Description

Add support for translation of MatMulNBits contrib op to QNN with FullyConnected operation with INT4 BlockQuantized weights

Implementation details:

  • Translate MatMulNBits to FullyConnected in OpBuilder
  • Support QNN_QUANTIZATION_ENCODING_BLOCK for INT4 weights
  • Pass INT4 weights and quant params as BlockQuantization encoding params in QNN

Testing:

  • Added new unit tests for MNB -> QNN-GPU
  • Validated all OnnxRuntime tests
  • Validated the following LLMs through Olive and ORT-GenAI execution flow
    • LlaMA3.2 1B
    • Qwen2.5
    • DeepSeek-R1-Qwen 1.5b
    • Phi3.5-mini-instruct

Motivation and Context

LLMs with INT4 quantization pass in Olive will generate a model with MatMulMBits contrib ops. To run these ops via QNN-EP, MatMulNBits is translated to QNN FullyConnected op with INT4 weights.

quic-tirupath avatar Oct 17 '25 16:10 quic-tirupath

@chilo-ms Could you please trigger CI ?

quic-tirupath avatar Oct 17 '25 16:10 quic-tirupath

/azp run Windows ARM64 QNN CI Pipeline

edgchen1 avatar Oct 17 '25 22:10 edgchen1

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Oct 17 '25 22:10 azure-pipelines[bot]

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows GPU Doc Gen CI Pipeline

edgchen1 avatar Oct 21 '25 00:10 edgchen1

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines[bot] avatar Oct 21 '25 00:10 azure-pipelines[bot]

@edgchen1 Please holdoff on merging this pull request. There might be an issue with this change.

johnpaultaken avatar Oct 21 '25 12:10 johnpaultaken

We are seeing NaN outputs for Qwen and DeepSeek 1.5B using MatmulNBits that was triaged to this change.

johnpaultaken avatar Oct 21 '25 12:10 johnpaultaken

We are seeing NaN outputs for Qwen and DeepSeek 1.5B using MatmulNBits that was triaged to this change.

The NaN issue is identified and fixed from Qnn Gpu backend. Confirmed that there are no issues with this PR

skadaver-qti avatar Nov 03 '25 04:11 skadaver-qti

We are seeing NaN outputs for Qwen and DeepSeek 1.5B using MatmulNBits that was triaged to this change.

The NaN issue is identified and fixed from Qnn Gpu backend. Confirmed that there are no issues with this PR

I tested with the next release of Qnn Gpu 2.40. The issue still seems to be present. Lets discuss offline and clarify things before this change is merged.

johnpaultaken avatar Nov 03 '25 07:11 johnpaultaken

@edgchen1 Please holdoff on merging this pull request. There might be an issue with this change.

@edgchen1 Thanks for holding off. This issue is verified as fixed with QNN SDK 2.41. Please procced with the merge.

johnpaultaken avatar Nov 19 '25 01:11 johnpaultaken

@edgchen1 Thanks for the review and suggestions. We addressed the comments and rebased the PR. Could you please kindly review and approve the PR. Please help to trigger CI as well.

quic-tirupath avatar Dec 13 '25 03:12 quic-tirupath

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

edgchen1 avatar Dec 16 '25 21:12 edgchen1

Command 'Linux' is not supported by Azure Pipelines.

Supported commands

  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

azure-pipelines[bot] avatar Dec 16 '25 21:12 azure-pipelines[bot]

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines[bot] avatar Dec 16 '25 22:12 azure-pipelines[bot]

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

edgchen1 avatar Dec 16 '25 22:12 edgchen1

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines[bot] avatar Dec 16 '25 22:12 azure-pipelines[bot]

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows GPU Doc Gen CI Pipeline

edgchen1 avatar Dec 18 '25 19:12 edgchen1

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines[bot] avatar Dec 18 '25 19:12 azure-pipelines[bot]