Accuracy loss of TensorRT 8.6 when running INT8 Quantized Resnet18 on GPU A4000

Open YixuanSeanZhou opened this issue 1 year ago • 11 comments

Description

When performing Resnet18 PTQ using TRT-modelopt, I encountered the following issue when compiling the model with TRT.

First off, I started with a pretrained resnet18 from torchvision. I replaced the last fully connected layer to fit on my dataset (for example, CIFRA-10). I also updated all the skip layers (the plus) with a ElementwiseAdd layer and I defined its quantization layer as follow myself (code attached at the end). The reason I do this is to facilitate the Q/DQ fusion so that every layer can be in INT8.

Then, when compiling the exported onnx model with TRT, I found that TRT outputs is very different from the fake Q/DQ model in python, and the fake Q/DQ onnx model as well when running with onnx runtime. (np.allclose with 1e-3 as the threshold failed). Comparing TRT and native output, the classification result disagrees for ~2.3%

I discussed with TRT modelopt in this issue and they suggested to file a bug report here

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: A4000

NVIDIA Driver Version: 535.183.01

CUDA Version: 12.2

Python Version (if applicable): 3.10.1

PyTorch Version (if applicable): '2.4.0+cu124'

Relevant Files

Model link: You can download the onnx model and the TRT engine here: https://file.io/GnuiEMNeebQ1

Steps To Reproduce

Run the TRT model using Python API and the onnx model with Cifar-10 datasets using the following data loader, and compares the result.

testset = datasets.CIFAR10(root='./data', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=1, shuffle=False)

Have you tried the latest release?: Haven't tried TRT10, but we don't plan to upgrade in the short period. I was under the impression 8.6 should be okay.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes, Onnx Runtime generates 1% disagreement in the Native model

Appendix

Visualizing the TRT engine, I think it is completely within my expectation with everything being fused as INT8 kernels. trt_engine_0

Aug 14 '24 19:08 YixuanSeanZhou

TensorRT TensorRT copied to clipboard

Accuracy loss of TensorRT 8.6 when running INT8 Quantized Resnet18 on GPU A4000

Description

Environment

Relevant Files

Steps To Reproduce

Appendix

TensorRT
TensorRT copied to clipboard