TensorRT
TensorRT copied to clipboard
Improved yolov6 to have the same trt model inference speed under int8 and fp16
Description
I improved yolov6. After converting to tensorrt, the improved model's inference speed is faster than the original yolov6 under fp32 and fp16. However, after converting to int8, the speed of the original yolov6 has doubled (fp16:66fps -> int8:122fps ), and the speed of the improved yolov6 is (fp16: 100fps -> int8: 102fps). The speed is almost not improved. What is the reason? My improvement module includes split, concat, and DropPath operations. Onnx.opset is set to 12 because 13 will report an error.
Environment
TensorRT Version:7.1.3.0
CUDA Version:10.2
Operating System:NVIDIA Jetson Xavier NX
PyTorch Version (if applicable):1.11
Steps To Reproduce
Commands or scripts: 1、PC:python ./deploy/ONNX/export_onnx.py --weights runs/train/yolov6-fast/weights/best_ckpt.pt --device 1 --simplify 2、NX:python3 onnx_to_trt.py -m ./weights/best_ckpt.onnx -d int8
Have you tried the latest release?:No
TRT 7.1 is too old, could you please upgrade to the latest Jeptack release?
TRT 7.1 is too old, could you please upgrade to the latest Jeptack release?
I printed the model conversion information on NX and found that the model accuracy before and after the split operation was fp16, and even some were fp32. On the PC side, Tensorrt8 was used for conversion, and it was found that the improved models were all converted to int8. Since upgrading Tensorrt from NX is too complicated, how can I convert split to int8 on Tensorrt7.1.3.0? The following is some information printed by model conversion:
Engine Layer Information:
[TensorRT] VERBOSE: Layer(Reformat): Conv_0 + Relu_1 input reformatter 0, Tactic: 0, images[Float(3,640,640)] -> Conv_0 + Relu_1 reformatted input 0[Int8(3,640,640)]
[TensorRT] VERBOSE: Layer(icudnn): Conv_0 + Relu_1, Tactic: -6282183216199417697, Conv_0 + Relu_1 reformatted input 0[Int8(3,640,640)] -> onnx::Conv_138[Int8(32,320,320)]
[TensorRT] VERBOSE: Layer(icudnn): Conv_2 + Relu_3, Tactic: -9204333525109552344, onnx::Conv_138[Int8(32,320,320)] -> Conv_2 + Relu_3 output to be reformatted 0[Float(64,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Conv_2 + Relu_3 output reformatter 0, Tactic: 0, Conv_2 + Relu_3 output to be reformatted 0[Float(64,160,160)] -> onnx::Split_140[Half(64,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Split_4, Tactic: 0, onnx::Split_140[Half(16,160,160)] -> input.8[Half(16,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Conv_5 input reformatter 0, Tactic: 0, input.8[Half(16,160,160)] -> Conv_5 reformatted input 0[Int8(16,160,160)]
[TensorRT] VERBOSE: Layer(icudnn): Conv_5, Tactic: 8047041638267142825, Conv_5 reformatted input 0[Int8(16,160,160)] -> Conv_5 output to be reformatted 0[Int8(16,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Conv_5 output reformatter 0, Tactic: 0, Conv_5 output to be reformatted 0[Int8(16,160,160)] -> input.12[Half(16,160,160)]
[TensorRT] VERBOSE: Layer(Reformat): Split_4_1, Tactic: 0, onnx::Split_140[Half(48,160,160)] -> input.12[Half(48,160,160)]
Sorry we have to upgrade to TRT8 for Jetpack, I will close this since there is no activity in a long time.