TensorRT Get wrong quantization result after setting TensorQuantizer.use_fb_fake

Get wrong quantization result after setting TensorQuantizer.use_fb_fake_quant = True

Open Wang-Qk opened this issue 3 years ago • 1 comments

Description

It is caused by a bug from pytorch, as shown in the issus "torch.fake_quantize_per_tensor_affine will affect the results of the model if the input memory is not contiguous". It has been fixed in pytorch, but may confuse someone like me using low version pytorch (1.8.1).

When using the tool pytorch-quantization, before exporting a torch model into onnx one, we need set TensorQuantizer.use_fb_fake_quant = True, so that torch.onnx.export can parser the quanizer successfully. However, it means that torch.fake_quantize_per_tensor_affine may be called (tensor_quantizer.py#310) and the bug may be triggered.

Maybe we could fix the bug with checking if the input is contiguous before calling torch.fake_quantize_per_tensor_affine in TensorQuantizer.

Environment

TensorRT Version: 8.2.4.2 PyTorch Version (if applicable): 1.8.1+cu102

Aug 04 '22 08:08 Wang-Qk

@ttyio Looks like we can improve it ^ ^

Aug 04 '22 14:08 zerollzeng

Thanks @Wang-Qk , the pytorch-quantization is tested with torch1.9.1+, see README in https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization, there might other issues for lower version torch.

Feb 15 '23 09:02 ttyio

Closing since no activity for more than 3 weeks, thank you!

Mar 28 '23 03:03 ttyio

TensorRT TensorRT copied to clipboard

Get wrong quantization result after setting TensorQuantizer.use_fb_fake_quant = True

Description

Environment

TensorRT
TensorRT copied to clipboard