onnxruntime Add the possibility to quantize MatMul per-tensor when per

Description: When quantizing a model with per_channel=True, we should have the possibility to quantize linear layers in a per_tensor way as it does not make sense to quantize them per-feature. This PR adds this functionality to the MatMul operator: users just have to specify extra_options["QDQOpTypePerChannelSupportToAxis"]["MatMul"] = None to quantize all layers per-channel except the linear ones.

Motivation and Context

Why is this change required? Linear layers are not independent across features. Thus, we should be able to quantize convolutional layers per channel and linear ones per tensor at the same time.
It fixes #10283 and #11890.

Jun 27 '22 11:06 regisss

@yufenglee @chilo-ms any feedback on this PR?

Jul 04 '22 07:07 regisss

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

Jul 14 '22 20:07 ytaous

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

Jul 14 '22 20:07 ytaous

Azure Pipelines successfully started running 9 pipeline(s).

Jul 14 '22 20:07 azure-pipelines[bot]

Azure Pipelines successfully started running 8 pipeline(s).

Jul 14 '22 20:07 azure-pipelines[bot]

@yufenglee @chilo-ms

Jul 15 '22 17:07 ytaous

All CLA requirements met.

Aug 19 '22 16:08 ghost

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

Aug 19 '22 17:08 yufenglee

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

Aug 19 '22 17:08 yufenglee

Azure Pipelines successfully started running 6 pipeline(s).

Aug 19 '22 17:08 azure-pipelines[bot]

Azure Pipelines successfully started running 8 pipeline(s).

Aug 19 '22 17:08 azure-pipelines[bot]

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

Aug 19 '22 23:08 yufenglee

Azure Pipelines successfully started running 9 pipeline(s).

Aug 19 '22 23:08 azure-pipelines[bot]

onnxruntime
onnxruntime copied to clipboard

Add the possibility to quantize MatMul per-tensor when per_channel=True

onnxruntime onnxruntime copied to clipboard

Add the possibility to quantize MatMul per-tensor when per_channel=True

onnxruntime
onnxruntime copied to clipboard