TensorRT Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G

Description

I tried to perform inference time statistics for the segmentation model on my machine(bisenetv2) between TRT-8.5.3.1 VS TRT-10.5.0.18. But I found a big difference in inference speed between the two versions. While 5.8ms with TTR8 and 9.0ms with TRT10 for the same model using "trtexec --loadEngine". It doesn't look right, I need your help. Thanks!!

Environment

TensorRT Version: TRT8.5.3.1/TRT-10.5.0.18

NVIDIA GPU: RTX 3060 - 12G

NVIDIA Driver Version: 536.23

CUDA Version: V11.6

CUDNN Version: V6.5.0

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link: https://github.com/CoinCheung/BiSeNet/releases/tag/0.0.0

Steps To Reproduce

Commands or scripts:

For TRT8: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet8.trt --fp16

using trtexec.exe --loadEngine=./besnet8.trt get result: "GPU Compute Time: min = 5.3894 ms, max = 7.0011 ms, mean = 5.82974 ms, median = 5.73438 ms, percentile(90%) = 6.32324 ms, percentile(95%) = 6.56079 ms, percentile(99%) = 7.0011 ms"

For TRT10: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet10.trt --fp16

using trtexec.exe --loadEngine=./besnet10.trt get result: "GPU Compute Time: min = 8.03729 ms, max = 10.4243 ms, mean = 9.02904 ms, median = 8.96878 ms, percentile(90%) = 9.50806 ms, percentile(95%) = 9.74951 ms, percentile(99%) = 10.4243 ms"

Have you tried the latest release?: Not yet.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):