tensorflow-onnx
tensorflow-onnx copied to clipboard
Hardcoded UInt8 idtype in FakeQuantWithMinMaxArgs, unsupported in TensorRT
These two lines look odd to me... Why hard code the input dtype to uint8? Such type is unsupported in TensorRT. How can I avoid such problems when I convert QAT int8 model to onnx and then to TensorRT? https://github.com/onnx/tensorflow-onnx/blob/482330f9958eb45c805933f04e2b0a5c7a494f23/tf2onnx/onnx_opset/quantize.py#L57
It seems from PR commit fix operator for fakequantize and output type constraint could be int8 in ONNX spec quantizelinear-13. Need to do more investigation. Hi @xadupre, could you please take a look and have any suggestion for this issue? Thanks!
How to choose between int8 or uint8? I don't remember if tensorflow is giving enough information to make the choice. Otherwise, it is possible to add an option to tensorflow to force to one type based at conversion time. Another possibility it to change the time after the conversion is done with a rewriter.
Hi @doomooo , sorry for the late reply. Could you please share your INT8 case or any simple reproduction script code? We will try to find a way to solve it.
The reason why the FakeQuantWithMinMaxArgs only supports unit8 is the quantization range belonged.
inputs values are quantized into the quantization range ([0; 2^num_bits - 1] when narrow_range is false and [1; 2^num_bits - 1] when it is true)