TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Accuracy loss of TensorRT 8.6 when running INT8 Quantized Resnet18 on GPU A4000

Open YixuanSeanZhou opened this issue 1 year ago • 11 comments

Description

When performing Resnet18 PTQ using TRT-modelopt, I encountered the following issue when compiling the model with TRT.

First off, I started with a pretrained resnet18 from torchvision. I replaced the last fully connected layer to fit on my dataset (for example, CIFRA-10). I also updated all the skip layers (the plus) with a ElementwiseAdd layer and I defined its quantization layer as follow myself (code attached at the end). The reason I do this is to facilitate the Q/DQ fusion so that every layer can be in INT8.

Then, when compiling the exported onnx model with TRT, I found that TRT outputs is very different from the fake Q/DQ model in python, and the fake Q/DQ onnx model as well when running with onnx runtime. (np.allclose with 1e-3 as the threshold failed). Comparing TRT and native output, the classification result disagrees for ~2.3%

I discussed with TRT modelopt in this issue and they suggested to file a bug report here

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: A4000

NVIDIA Driver Version: 535.183.01

CUDA Version: 12.2

Python Version (if applicable): 3.10.1

PyTorch Version (if applicable): '2.4.0+cu124'

Relevant Files

Model link: You can download the onnx model and the TRT engine here: https://file.io/GnuiEMNeebQ1

Steps To Reproduce

Run the TRT model using Python API and the onnx model with Cifar-10 datasets using the following data loader, and compares the result.

testset = datasets.CIFAR10(root='./data', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=1, shuffle=False)

Have you tried the latest release?: Haven't tried TRT10, but we don't plan to upgrade in the short period. I was under the impression 8.6 should be okay.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes, Onnx Runtime generates 1% disagreement in the Native model

Appendix

Visualizing the TRT engine, I think it is completely within my expectation with everything being fused as INT8 kernels. trt_engine_0

YixuanSeanZhou avatar Aug 14 '24 19:08 YixuanSeanZhou