TensorRT
TensorRT copied to clipboard
Get wrong quantization result after setting TensorQuantizer.use_fb_fake_quant = True
Description
It is caused by a bug from pytorch, as shown in the issus "torch.fake_quantize_per_tensor_affine will affect the results of the model if the input memory is not contiguous". It has been fixed in pytorch, but may confuse someone like me using low version pytorch (1.8.1).
When using the tool pytorch-quantization, before exporting a torch model into onnx one, we need set TensorQuantizer.use_fb_fake_quant = True, so that torch.onnx.export can parser the quanizer successfully.
However, it means that torch.fake_quantize_per_tensor_affine may be called (tensor_quantizer.py#310) and the bug may be triggered.
Maybe we could fix the bug with checking if the input is contiguous before calling torch.fake_quantize_per_tensor_affine in TensorQuantizer.
Environment
TensorRT Version: 8.2.4.2 PyTorch Version (if applicable): 1.8.1+cu102
@ttyio Looks like we can improve it ^ ^
Thanks @Wang-Qk , the pytorch-quantization is tested with torch1.9.1+, see README in https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization, there might other issues for lower version torch.
Closing since no activity for more than 3 weeks, thank you!