TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

result verify failure of TensorRT 8.5.10 when converting onnx model to trt model

Open ForMerLen opened this issue 1 year ago • 5 comments

Description

I have trained a tensorflow model and converted it into onnx modesl and trt models, but the results of these two models can totally not match. I use elu activation function in all layers and the results don't match. but when I use linear activation function in all layers trained with tensorflow, and then converting the tf model into onnx and trt model, the two models' results can match.

Environment

TensorRT Version:8.5.10 and 8.5.3.1

NVIDIA GPU:3090

NVIDIA Driver Version:

CUDA Version: 11.6

CUDNN Version:

Operating System:Ubuntu 20.04

Python Version (if applicable):3.10

Tensorflow Version (if applicable): 2.10

ForMerLen avatar Feb 28 '24 07:02 ForMerLen

I trained a network model using TensorFlow and converted it into an ONNX model and a TRT model, but the results of these two models do not match. When training TensorFlow, I used the ELU activation function, but the results of the two models did not match. However, when I used the linear activation function in all layers, the results of the two models could match. How can I solve this problem?

ForMerLen avatar Feb 28 '24 08:02 ForMerLen

@zerollzeng I'm sorry to bother you, but can you give me some advice and limited to the hardware and software, I can only use TensorRT 8.5.10 version

ForMerLen avatar Feb 28 '24 08:02 ForMerLen

what if you try inspecting your onnx with netron to see how the onnx is changing w.r.t. the activation function? how does the results don't match, like in what way? are the bounding boxes being drawn in different location, is there less number of detections or is it more mis-detections?

RajUpadhyay avatar Feb 29 '24 02:02 RajUpadhyay

Could you please check the onnx accuracy first? e.g. with onnxruntime.

zerollzeng avatar Mar 01 '24 06:03 zerollzeng

Or it can be quickly checked with polygraphy: polygraphy run model.onnx --trt --onnxrt to see if the accuracy is matched between onnxruntime and TRT. Sometimes it's caused by exporting.

zerollzeng avatar Mar 01 '24 06:03 zerollzeng

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!

ttyio avatar Mar 26 '24 17:03 ttyio