tensorflow-onnx icon indicating copy to clipboard operation
tensorflow-onnx copied to clipboard

Hardcoded UInt8 idtype in FakeQuantWithMinMaxArgs, unsupported in TensorRT

Open doomooo opened this issue 3 years ago • 4 comments

These two lines look odd to me... Why hard code the input dtype to uint8? Such type is unsupported in TensorRT. How can I avoid such problems when I convert QAT int8 model to onnx and then to TensorRT? https://github.com/onnx/tensorflow-onnx/blob/482330f9958eb45c805933f04e2b0a5c7a494f23/tf2onnx/onnx_opset/quantize.py#L57

doomooo avatar May 07 '22 09:05 doomooo

It seems from PR commit fix operator for fakequantize and output type constraint could be int8 in ONNX spec quantizelinear-13. Need to do more investigation. Hi @xadupre, could you please take a look and have any suggestion for this issue? Thanks!

hwangdeyu avatar May 11 '22 09:05 hwangdeyu

How to choose between int8 or uint8? I don't remember if tensorflow is giving enough information to make the choice. Otherwise, it is possible to add an option to tensorflow to force to one type based at conversion time. Another possibility it to change the time after the conversion is done with a rewriter.

xadupre avatar May 12 '22 22:05 xadupre

Hi @doomooo , sorry for the late reply. Could you please share your INT8 case or any simple reproduction script code? We will try to find a way to solve it.

hwangdeyu avatar Aug 25 '22 07:08 hwangdeyu

The reason why the FakeQuantWithMinMaxArgs only supports unit8 is the quantization range belonged.

inputs values are quantized into the quantization range ([0; 2^num_bits - 1] when narrow_range is false and [1; 2^num_bits - 1] when it is true)

hwangdeyu avatar Aug 31 '22 03:08 hwangdeyu