TensorRT Improved yolov6 to have the same trt model inference speed under int8 and fp16

Description

I improved yolov6. After converting to tensorrt, the improved model's inference speed is faster than the original yolov6 under fp32 and fp16. However, after converting to int8, the speed of the original yolov6 has doubled (fp16:66fps -> int8:122fps ), and the speed of the improved yolov6 is (fp16: 100fps -> int8: 102fps). The speed is almost not improved. What is the reason? My improvement module includes split, concat, and DropPath operations. Onnx.opset is set to 12 because 13 will report an error.

Environment

TensorRT Version:7.1.3.0

CUDA Version:10.2

Operating System:NVIDIA Jetson Xavier NX

PyTorch Version (if applicable):1.11

Steps To Reproduce

Commands or scripts: 1、PC：python ./deploy/ONNX/export_onnx.py --weights runs/train/yolov6-fast/weights/best_ckpt.pt --device 1 --simplify 2、NX：python3 onnx_to_trt.py -m ./weights/best_ckpt.onnx -d int8

Have you tried the latest release?:No

Dec 08 '23 03:12 xiaoche-24

TRT 7.1 is too old, could you please upgrade to the latest Jeptack release?

Dec 09 '23 10:12 zerollzeng

TRT 7.1 is too old, could you please upgrade to the latest Jeptack release?

I printed the model conversion information on NX and found that the model accuracy before and after the split operation was fp16, and even some were fp32. On the PC side, Tensorrt8 was used for conversion, and it was found that the improved models were all converted to int8. Since upgrading Tensorrt from NX is too complicated, how can I convert split to int8 on Tensorrt7.1.3.0? The following is some information printed by model conversion:

Engine Layer Information:
[TensorRT] VERBOSE: Layer(Reformat): Conv_0 + Relu_1 input reformatter 0, Tactic: 0, images[Float(3,640,640)] -> Conv_0 + Relu_1 reformatted input 0[Int8(3,640,640)]
[TensorRT] VERBOSE: Layer(icudnn): Conv_0 + Relu_1, Tactic: -6282183216199417697, Conv_0 + Relu_1 reformatted input 0[Int8(3,640,640)] -> onnx::Conv_138[Int8(32,320,320)]
[TensorRT] VERBOSE: Layer(icudnn): Conv_2 + Relu_3, Tactic: -9204333525109552344, onnx::Conv_138[Int8(32,320,320)] -> Conv_2 + Relu_3 output to be reformatted 0[Float(64,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Conv_2 + Relu_3 output reformatter 0, Tactic: 0, Conv_2 + Relu_3 output to be reformatted 0[Float(64,160,160)] -> onnx::Split_140[Half(64,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Split_4, Tactic: 0, onnx::Split_140[Half(16,160,160)] -> input.8[Half(16,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Conv_5 input reformatter 0, Tactic: 0, input.8[Half(16,160,160)] -> Conv_5 reformatted input 0[Int8(16,160,160)]
[TensorRT] VERBOSE: Layer(icudnn): Conv_5, Tactic: 8047041638267142825, Conv_5 reformatted input 0[Int8(16,160,160)] -> Conv_5 output to be reformatted 0[Int8(16,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Conv_5 output reformatter 0, Tactic: 0, Conv_5 output to be reformatted 0[Int8(16,160,160)] -> input.12[Half(16,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Split_4_1, Tactic: 0, onnx::Split_140[Half(48,160,160)] -> input.12[Half(48,160,160)]

Dec 11 '23 05:12 xiaoche-24

Sorry we have to upgrade to TRT8 for Jetpack, I will close this since there is no activity in a long time.

Jul 02 '24 17:07 ttyio

TensorRT TensorRT copied to clipboard

Improved yolov6 to have the same trt model inference speed under int8 and fp16

Description

Environment

Steps To Reproduce

TensorRT
TensorRT copied to clipboard