Comparison of infer speed between different TRT8.5.3.1 VS TRT 10.5.0.18 on GPU 3060-12G/4060Ti-16G
Description
I tried to perform inference time statistics for the segmentation model on my machine(bisenetv2) between TRT-8.5.3.1 VS TRT-10.5.0.18. But I found a big difference in inference speed between the two versions. While 5.8ms with TTR8 and 9.0ms with TRT10 for the same model using "trtexec --loadEngine". It doesn't look right, I need your help. Thanks!!
Environment
TensorRT Version: TRT8.5.3.1/TRT-10.5.0.18
NVIDIA GPU: RTX 3060 - 12G
NVIDIA Driver Version: 536.23
CUDA Version: V11.6
CUDNN Version: V6.5.0
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link: https://github.com/CoinCheung/BiSeNet/releases/tag/0.0.0
Steps To Reproduce
Commands or scripts:
- For TRT8: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet8.trt --fp16
using trtexec.exe --loadEngine=./besnet8.trt get result: "GPU Compute Time: min = 5.3894 ms, max = 7.0011 ms, mean = 5.82974 ms, median = 5.73438 ms, percentile(90%) = 6.32324 ms, percentile(95%) = 6.56079 ms, percentile(99%) = 7.0011 ms"
- For TRT10: trtexec.exe --onnx=BiSeNet-master\BiSeNet-master\model.onnx --minShapes=input_image:1x3x640x640 --optShapes=input_image:8x3x640x640 --maxShapes=input_image:8x3x640x640 --saveEngine=./besnet10.trt --fp16
using trtexec.exe --loadEngine=./besnet10.trt get result: "GPU Compute Time: min = 8.03729 ms, max = 10.4243 ms, mean = 9.02904 ms, median = 8.96878 ms, percentile(90%) = 9.50806 ms, percentile(95%) = 9.74951 ms, percentile(99%) = 10.4243 ms"
Have you tried the latest release?: Not yet.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
by the way, the infer results are consistent..
For trt10.5.0.18, you should add the flag --builderOptimizationLevel=5.
For trt10.5.0.18, you should add the flag
--builderOptimizationLevel=5.
I tried your suggestion but didn't get any performance improvement
Try to use the latest version of trt.
Hello @gaoyu-cao ! I don't see any onnx models in your provided model link. Can you verify?