Add support for joint quantization and graph optimization of the models

Open jegork opened this issue 3 years ago • 0 comments

Feature request

I have noticed that it is yet impossible to have the model exported to ONNX with both graph optimization and quantization. Are there any plans to add this?

Motivation

For cases like CPU inference, it would be nice to profit both from quantization and graph optimization, considering this looks possible with ONNX.

Your contribution

I can try submitting a PR, however, this might be complicated to integrate it with the existing ORTQuantizer and ORTOptimizer. So far I have managed to apply dynamic quantization after the graph optimization using (my sketch code, can add full code on request):

from onnxruntime.quantization import QuantType, quantize_dynamic

# first export the optimized model using optimizer
optimizer.export(
    onnx_model_path=onnx_model_path,
    onnx_optimized_model_output_path=onnx_optimized_model_path,
    optimization_config=optimization_config,
)

quantize_dynamic(
    onnx_optimized_model_path,
    onnx_quantized_model_path,
    weight_type=QuantType.QInt8,
)

Aug 19 '22 20:08 jegork