TensorRT
TensorRT copied to clipboard
RT-DETR FP16 inference get correct result on v100 but weird result on a10
Description
I am using tritonserver:23.10 to deploy RT-DETR model. The onnxruntime fp32, onnxruntime fp16 and Tesla V100 TRT 8.6.1 FP16/F32 both get the correct result. But Tesla A10 TRT 8.6.1 get correct result in FP32 and weird result in FP16. The FP16 result should be same both in V100 and A10 with the same code.
A10 - TRT - FP16
V100 - TRT - FP16
Environment
TensorRT Version: TensoRT 8.6.1
NVIDIA GPU: Tesla V100 / A10
NVIDIA Driver Version: 515.65.01
CUDA Version: 12.2
CUDNN Version: v8
Operating System: Ubuntu 22.04
Python Version (if applicable): 3.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.0.1
Baremetal or Container (if so, version): tritonserver 23.10
Relevant Files
https://github.com/lyuwenyu/RT-DETR
Steps To Reproduce
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.plan --fp16
- Did you test metric like mAP?
- Could you please share the onnx here for reproduce.
Thanks!
FP16 may introduce accuracy drop so it's hard to said whether it's bug unless we have generic metric like mAP.
closing since no activity for more than 3 weeks, thanks all!
https://github.com/NVIDIA/TensorRT/issues/3700
Please reopen this issue to track Ampere accuracy lose issue on all detr like models.