TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Get wrong quantization result after setting TensorQuantizer.use_fb_fake_quant = True

Open Wang-Qk opened this issue 3 years ago • 1 comments

Description

It is caused by a bug from pytorch, as shown in the issus "torch.fake_quantize_per_tensor_affine will affect the results of the model if the input memory is not contiguous". It has been fixed in pytorch, but may confuse someone like me using low version pytorch (1.8.1).

When using the tool pytorch-quantization, before exporting a torch model into onnx one, we need set TensorQuantizer.use_fb_fake_quant = True, so that torch.onnx.export can parser the quanizer successfully. However, it means that torch.fake_quantize_per_tensor_affine may be called (tensor_quantizer.py#310) and the bug may be triggered.

Maybe we could fix the bug with checking if the input is contiguous before calling torch.fake_quantize_per_tensor_affine in TensorQuantizer.

Environment

TensorRT Version: 8.2.4.2 PyTorch Version (if applicable): 1.8.1+cu102

Wang-Qk avatar Aug 04 '22 08:08 Wang-Qk

@ttyio Looks like we can improve it ^ ^

zerollzeng avatar Aug 04 '22 14:08 zerollzeng

Thanks @Wang-Qk , the pytorch-quantization is tested with torch1.9.1+, see README in https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization, there might other issues for lower version torch.

ttyio avatar Feb 15 '23 09:02 ttyio

Closing since no activity for more than 3 weeks, thank you!

ttyio avatar Mar 28 '23 03:03 ttyio