onnxruntime
onnxruntime copied to clipboard
Add the possibility to quantize MatMul per-tensor when per_channel=True
Description: When quantizing a model with per_channel=True
, we should have the possibility to quantize linear layers in a per_tensor
way as it does not make sense to quantize them per-feature. This PR adds this functionality to the MatMul
operator: users just have to specify extra_options["QDQOpTypePerChannelSupportToAxis"]["MatMul"] = None
to quantize all layers per-channel except the linear ones.
Motivation and Context
- Why is this change required? Linear layers are not independent across features. Thus, we should be able to quantize convolutional layers per channel and linear ones per tensor at the same time.
- It fixes #10283 and #11890.
@yufenglee @chilo-ms any feedback on this PR?
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline
Azure Pipelines successfully started running 9 pipeline(s).
Azure Pipelines successfully started running 8 pipeline(s).
@yufenglee @chilo-ms
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline
Azure Pipelines successfully started running 6 pipeline(s).
Azure Pipelines successfully started running 8 pipeline(s).
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline
Azure Pipelines successfully started running 9 pipeline(s).