TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Accuracy problem between onnx and fp16 trt inference

Open KexianShen opened this issue 10 months ago • 3 comments

Description

I am encountering an accuracy discrepancy between ONNX inference and TensorRT FP32 inference.

Environment

TensorRT Version: 10.8.0.43

NVIDIA GPU: RTX 3060

NVIDIA Driver Version: 560.35.05

CUDA Version: 12.4.99

CUDNN Version: 9.8.0.87

Operating System: 24.04.1-Ubuntu

Python Version (if applicable): 3.11.11

PyTorch Version (if applicable): 2.6.0

Relevant Files

Model link:

https://drive.google.com/drive/folders/1OaczMFXSv2a46QZHwSsn6lT8V3S_Ki-O?usp=sharing

Steps To Reproduce

polygraphy run --onnxrt tlr_202506151003.onnx \
    --data-loader-script tool/data_loader.py \
    --save-outputs outputs_fp32.json

trtexec --onnx=tlr_202506151003.onnx --saveEngine=model_fp32.plan

polygraphy run --trt model_fp32.plan \
    --data-loader-script tool/data_loader.py \
    --load-outputs outputs_fp32.json \
    --atol 0.01 --rtol 0.01

Have you tried the latest release?: not yet

KexianShen avatar Jun 18 '25 03:06 KexianShen

I'm not able to reproduce a significant difference at least on newer versions of TRT. Could you try disabling TF32 precision when building (add --noTF32 to your trtexec command) and see if the problem persists?

pranavm-nvidia avatar Jun 18 '25 20:06 pranavm-nvidia

yes, --noTF32 works, thank you

KexianShen avatar Jun 19 '25 02:06 KexianShen

precision difference between ONNX and fp16 TensorRT (TRT) outputs in the layer /model/encoder/rope_self_attn_layers/attention_norm/Pow exceeds tolerance, then I run

polygraphy run tlr_202506151003.onnx \
    --onnxrt \
    --trt --fp16 \
    --onnx-outputs mark all \
    --trt-outputs mark all \
    --atol 1e-1 \
    --rtol permissions:1e-1 \
    --layer-precisions \
    /model/encoder/rope_self_attn_layers/attention_norm/Pow:float32 \  # this doesn't work
    --precision-constraints obey \
    --fail-fast

but this doesn't work

KexianShen avatar Jun 19 '25 09:06 KexianShen

Could you check whether it's a limitation with the model itself? You could try converting the ONNX model to FP16 (example) and running it with ONNX-RT to see if you have the same discrepancy.

pranavm-nvidia avatar Jun 24 '25 17:06 pranavm-nvidia

I converted the model to FP16 and there exists the same discrepancy.

[E]         FAILED | Output: 'query' | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E]     FAILED | Mismatched outputs: ['box', 'cls', 'color', 'orientation', 'related', 'query']
[E] Accuracy Summary | onnxrt-runner-N0-06/26/25-10:37:38 vs. onnxrt-runner-N0-06/26/25-10:37:18 | Passed: 0/1 iterations | Pass Rate: 0.0%

KexianShen avatar Jun 26 '25 02:06 KexianShen

Seems likely that the model itself is not FP16 friendly then. You can try to convert it in a smarter way using something like ModelOpt's AutoCast tool

pranavm-nvidia avatar Jun 26 '25 17:06 pranavm-nvidia

hi, i also have accuracy issues when i convert the model to fp16 using tensorrt.

geraldstanje avatar Jun 27 '25 16:06 geraldstanje