Accuracy problem between onnx and fp16 trt inference
Description
I am encountering an accuracy discrepancy between ONNX inference and TensorRT FP32 inference.
Environment
TensorRT Version: 10.8.0.43
NVIDIA GPU: RTX 3060
NVIDIA Driver Version: 560.35.05
CUDA Version: 12.4.99
CUDNN Version: 9.8.0.87
Operating System: 24.04.1-Ubuntu
Python Version (if applicable): 3.11.11
PyTorch Version (if applicable): 2.6.0
Relevant Files
Model link:
https://drive.google.com/drive/folders/1OaczMFXSv2a46QZHwSsn6lT8V3S_Ki-O?usp=sharing
Steps To Reproduce
polygraphy run --onnxrt tlr_202506151003.onnx \
--data-loader-script tool/data_loader.py \
--save-outputs outputs_fp32.json
trtexec --onnx=tlr_202506151003.onnx --saveEngine=model_fp32.plan
polygraphy run --trt model_fp32.plan \
--data-loader-script tool/data_loader.py \
--load-outputs outputs_fp32.json \
--atol 0.01 --rtol 0.01
Have you tried the latest release?: not yet
I'm not able to reproduce a significant difference at least on newer versions of TRT. Could you try disabling TF32 precision when building (add --noTF32 to your trtexec command) and see if the problem persists?
yes, --noTF32 works, thank you
precision difference between ONNX and fp16 TensorRT (TRT) outputs in the layer /model/encoder/rope_self_attn_layers/attention_norm/Pow exceeds tolerance, then I run
polygraphy run tlr_202506151003.onnx \
--onnxrt \
--trt --fp16 \
--onnx-outputs mark all \
--trt-outputs mark all \
--atol 1e-1 \
--rtol permissions:1e-1 \
--layer-precisions \
/model/encoder/rope_self_attn_layers/attention_norm/Pow:float32 \ # this doesn't work
--precision-constraints obey \
--fail-fast
but this doesn't work
Could you check whether it's a limitation with the model itself? You could try converting the ONNX model to FP16 (example) and running it with ONNX-RT to see if you have the same discrepancy.
I converted the model to FP16 and there exists the same discrepancy.
[E] FAILED | Output: 'query' | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E] FAILED | Mismatched outputs: ['box', 'cls', 'color', 'orientation', 'related', 'query']
[E] Accuracy Summary | onnxrt-runner-N0-06/26/25-10:37:38 vs. onnxrt-runner-N0-06/26/25-10:37:18 | Passed: 0/1 iterations | Pass Rate: 0.0%
Seems likely that the model itself is not FP16 friendly then. You can try to convert it in a smarter way using something like ModelOpt's AutoCast tool
hi, i also have accuracy issues when i convert the model to fp16 using tensorrt.