tensorflow-onnx
tensorflow-onnx copied to clipboard
Fix QAT model converting
Convert quantization aware trained model from TF to ONNX has several issues --
QuantizeLinearandDequantizeLinearare fused into conv layer, but the downstream compiler(e.g., TensorRT) needs the Q/DQ layers to determine whether to use int8 or not. See issue #1972 . We need to keep Q/DQ layer unfused. QuantizeLinear and DequantizeLinear are corresponding toFakeQuantWithMinMaxVarsin TensorFlow, so excluding it fromcan_foldintf_utils.pycan solve it.- Need to allow
narrow_rangein quantized nodes. TensorRT maps [min, max] to [-127, 127](see Page 12) , which needs 0 in fp32 to be mapped to 0 in int8. Also see narrow_range=True in TensorRT/tools/tensorflow-quantization here.