onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Add the possibility to quantize MatMul per-tensor when per_channel=True

Open regisss opened this issue 2 years ago • 6 comments

Description: When quantizing a model with per_channel=True, we should have the possibility to quantize linear layers in a per_tensor way as it does not make sense to quantize them per-feature. This PR adds this functionality to the MatMul operator: users just have to specify extra_options["QDQOpTypePerChannelSupportToAxis"]["MatMul"] = None to quantize all layers per-channel except the linear ones.

Motivation and Context

  • Why is this change required? Linear layers are not independent across features. Thus, we should be able to quantize convolutional layers per channel and linear ones per tensor at the same time.
  • It fixes #10283 and #11890.

regisss avatar Jun 27 '22 11:06 regisss

@yufenglee @chilo-ms any feedback on this PR?

regisss avatar Jul 04 '22 07:07 regisss

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

ytaous avatar Jul 14 '22 20:07 ytaous

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

ytaous avatar Jul 14 '22 20:07 ytaous

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines[bot] avatar Jul 14 '22 20:07 azure-pipelines[bot]

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines[bot] avatar Jul 14 '22 20:07 azure-pipelines[bot]

@yufenglee @chilo-ms

ytaous avatar Jul 15 '22 17:07 ytaous

CLA assistant check
All CLA requirements met.

ghost avatar Aug 19 '22 16:08 ghost

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

yufenglee avatar Aug 19 '22 17:08 yufenglee

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

yufenglee avatar Aug 19 '22 17:08 yufenglee

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines[bot] avatar Aug 19 '22 17:08 azure-pipelines[bot]

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines[bot] avatar Aug 19 '22 17:08 azure-pipelines[bot]

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

yufenglee avatar Aug 19 '22 23:08 yufenglee

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines[bot] avatar Aug 19 '22 23:08 azure-pipelines[bot]