TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

The yolov5 model has an error in the detection result of tensorrt7.1

Open Monlter opened this issue 1 year ago • 7 comments

Description

I am using the official yolov5 model to export onnx, and converting it to engine using trtexec on two devices. The outputs from the two devices are quite different. The output of tensorrt7.1 will output multiple detection boxes for the same target (and the shape of the detection box is similar to the preset anchor, as if no WH adjustment has been made. Surprisingly, their confidence is very high), while the output of tensorrt8.5 is normal. It should be noted that onnx is the same file, and the command to convert it to engine on two devices is: /usr/src/tensorrt/bin/trtexec --onnx=./.onnx --saveEngine=./.engine --workspace=10240 --fp16. In addition, the inference images and codes are the same.

Environment

machine_1: arm ** JetPack**:4.4 TensorRT Version: 7.1.3.0 NVIDIA GPU: Xavier[16GB] CUDA Version: 10.2.89 CUDNN Version: 8.0.0.180

machine_2: x64 TensorRT Version: 8.5.1.7 NVIDIA GPU: A30 CUDA Version: 11.6 CUDNN Version: ***

Operating System: Python Version:3.8 PyTorch Version:1.8.0 onnx Version:1.16.1 This is the detection result on the tensor7.1 device tensorrt7 1

This is the detection result on the tensor8.5 device tensorrt8 5

Monlter avatar Jun 26 '24 08:06 Monlter

tensorrt7.1 is too old version.

lix19937 avatar Jun 26 '24 10:06 lix19937

tensorrt7.1 is too old version. Yes, but machine 1 can only use tensorrt7.1

Monlter avatar Jun 27 '24 01:06 Monlter

This is the detection result on the tensor7.1 device maybe need nms process. @Monlter

lix19937 avatar Jun 27 '24 05:06 lix19937

This is the detection result on the tensor7.1 device maybe need nms process. @Monlter The above results are all post-processed, and the nms threshold is set to 0.4. If the nms threshold is set too small, some boxes can be filtered out, but the boxes with the highest confidence are not in line with actual expectations because they cannot fit the edge of the object. The above results look more like the WH predicted by the model did not work, causing the shape of the result to tend to the shape of the anchor setting.

Monlter avatar Jun 27 '24 05:06 Monlter

You can check the result with fp32 precision by trt7.1

/usr/src/tensorrt/bin/trtexec --onnx=./onnx --saveEngine=./.engine --workspace=10240

and mark the result v71-fp32

If v71-fp32 similar with fp16 by trt7.1, maybe trt7.1 has some issue.
If v71-fp32 similar with fp16 by trt8.5, maybe you need compare the each layer output. And you can use polygraphy. @Monlter

lix19937 avatar Jun 27 '24 08:06 lix19937

You can check the result with fp32 precision by trt7.1

/usr/src/tensorrt/bin/trtexec --onnx=./onnx --saveEngine=./.engine --workspace=10240

and mark the result v71-fp32

If v71-fp32 similar with fp16 by trt7.1, maybe trt7.1 has some issue. If v71-fp32 similar with fp16 by trt8.5, maybe you need compare the each layer output. And you can use polygraphy. @Monlter

I tried the fp32 conversion method, but the result of tensorrt7.1 remained unchanged.

Monlter avatar Jun 27 '24 08:06 Monlter

Use follow

polygraphy run debug.onnx --trt --onnxrt --atol 0.001 --rtol 0.001  

lix19937 avatar Jun 27 '24 09:06 lix19937

No further development is planned for 7.1 or 8.5. Please use the most recent version available for your device.

kevinch-nv avatar Feb 13 '25 00:02 kevinch-nv