tensorflow-onnx Quantized model extra node emitted between Q-DQ pair

Quantized model extra node emitted between Q-DQ pair

Open chenfucn opened this issue 2 years ago • 3 comments

Describe the bug

When converting a quantized tflite mode to onnx, extra nodes (e.g. transpose, re-shape, etc.) got emitted between Q-DQ pairs. This prevents ORT graph optimizer to effectively fuse operators and achieve good performance.

Original issue from https://github.com/microsoft/onnxruntime/issues/14707

e.g. tflite model: converted onnx model:

The transpose node should be either before the QuantizeLinear node or after the DequantizeLinear node for ORT graph optimizer to work.

tflite mode: https://github.com/microsoft/onnxruntime/files/10751803/quantized_tflite.zip

converted onnx model https://github.com/microsoft/onnxruntime/files/10751800/quantized_onnx.zip

Urgency

System information

OS Platform and Distribution (e.g., Linux Ubuntu 18.04*):
TensorFlow Version:
Python version:
ONNX version (if applicable, e.g. 1.11*):
ONNXRuntime version (if applicable, e.g. 1.11*):

To Reproduce

Screenshots

Additional context

Feb 17 '23 03:02 chenfucn

Actually, this is a feature designed and implemented 2 years ago.

tf2onnx has an optimizer which will push DequantizeLinear down so that most of ops will be included between QuantizeLinear and DequantizeLinear pair. I guess the motivation was to lower down memory usage during inference.

Did you observe a big performance gap between the original onnx and the swapped onnx file mentioned in https://github.com/microsoft/onnxruntime/issues/14707?

If there is a big performance gap, probably we need to consider if this optimizer should be removed.

Feb 27 '23 15:02 fatcat-z

Yes, there is a huge performance drop when separation of Q-DQ node prevented operator fusion from working. for example:

https://github.com/microsoft/onnxruntime/issues/14707

an very simple model saw more than twice slower.

Feb 27 '23 17:02 chenfucn

Hi folks, any update? @hoangtv2000

Jun 24 '24 16:06 chenfucn

tensorflow-onnx tensorflow-onnx copied to clipboard

Quantized model extra node emitted between Q-DQ pair

tensorflow-onnx
tensorflow-onnx copied to clipboard