TensorRT_EX

Enviroments

Windows 10 laptop
CPU i7-11375H
GPU RTX-3060
Visual studio 2017
CUDA 11.1
TensorRT 8.0.3.4 (unet)
TensorRT 8.2.0.6 (detr, yolov5s, real-esrgan)
Opencv 3.4.5
make Engine directory for engine file
make Int8_calib_table directory for ptq calibration table

Custom plugin example

Layer for input preprocess(NHWC->NCHW, BGR->RGB, [0, 255]->[0, 1] (Normalize))
plugin_ex1.cpp (plugin sample code)
preprocess.hpp (plugin define)
preprocess.cu (preprocessing cuda kernel function)
Validation_py/Validation_preproc.py (Result validation with pytorch)

Classification model

vgg11 model

vgg11.cpp
with preprocess plugin

resnet18 model

resnet18.cpp
100 images from COCO val2017 dataset for PTQ calibration
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 224x224x3 image

	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	4.1 ms	1.7 ms	0.7 ms	0.6 ms
FPS [frame/sec]	243 fps	590 fps	1385 fps	1577 fps
Memory [GB]	1.551 GB	1.288 GB	0.941 GB	0.917 GB

Semantic Segmentaion model

UNet model (unet.cpp)
use TensorRT 8.0.3.4 version for unet model(For version 8.2.0.6, an error about the unet model occurs)
unet_carvana_scale0.5_epoch1.pth
additional preprocess (resize & letterbox padding) with openCV
postprocess (model output to image)
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 512x512x3 image

	Pytorch	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP16	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	66.21 ms	34.58 ms	40.81 ms	13.52 ms	8.19 ms
FPS [frame/sec]	15 fps	29 fps	25 fps	77 fps	125 fps
Memory [GB]	3.863 GB	2.677 GB	1.552 GB	1.367 GB	1.051 GB

Object Detection model(ViT)

DETR model (detr_trt.cpp)
additional preprocess (mean std normalization function)
postprocess (show out detection result to the image)
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 500x500x3 image

	Pytorch	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP16	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	37.03 ms	30.71 ms	16.40 ms	6.07 ms	5.30 ms
FPS [frame/sec]	27 fps	33 fps	61 fps	165 fps	189 fps
Memory [GB]	1.563 GB	1.511 GB	1.212 GB	1.091 GB	1.005 GB

Object Detection model

Yolov5s model (yolov5s.cpp)
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 640x640x3 image resized & padded

	Pytorch	TensorRT	TensorRT
Precision	FP32	FP32	Int8(PTQ)
Avg Duration time [ms]	7.72 ms	6.16 ms	2.86 ms
FPS [frame/sec]	129 fps	162 fps	350 fps
Memory [GB]	1.670 GB	1.359 GB	0.920 GB

Super-Resolution model

Real-ESRGAN model (real-esrgan.cpp)
RealESRGAN_x4plus.pth
Scale up 4x (448x640x3 -> 1792x2560x3)
Comparison of calculation execution time of 100 iteration and GPU memory usage
[update] RealESRGAN_x2plus model (set OUT_SCALE=2)

	Pytorch	Pytorch	TensorRT	TensorRT
Precision	FP32	FP16	FP32	FP16
Avg Duration time [ms]	4109 ms	1936 ms	2139 ms	737 ms
FPS [frame/sec]	0.24 fps	0.52 fps	0.47 fps	1.35 fps
Memory [GB]	5.029 GB	4.407 GB	3.807 GB	3.311 GB

Object Detection model 2

Yolov6s model (yolov6.cpp)
Comparison of calculation execution time of 1000 iteration and GPU memory usage (with preprocess, without nms, 536 x 640 x 3)

	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	20.7 ms	10.3 ms	3.54 ms	2.58 ms
FPS [frame/sec]	48.14 fps	96.21 fps	282.26 fps	387.89 fps
Memory [GB]	1.582 GB	1.323 GB	0.956 GB	0.913 GB

Object Detection model 3 (in progress)

Yolov7 model (yolov7.cpp)

Using C TensoRT model in Python using dll

TRT_DLL_EX : https://github.com/yester31/TRT_DLL_EX

A typical TensorRT model creation sequence using TensorRT API

Prepare the trained model in the training framework (generate the weight file to be used in TensorRT).
Implement the model using the TensorRT API to match the trained model structure.
Extract weights from the trained model.
Make sure to pass the weights appropriately to each layer of the prepared TensorRT model.
Build and run.
After the TensorRT model is built, the model stream is serialized and generated as an engine file.
Inference by loading only the engine file in the subsequent task(if model parameters or layers are modified, re-execute the previous (4) task).

reference

tensorrtx : https://github.com/wang-xinyu/tensorrtx
unet : https://github.com/milesial/Pytorch-UNet
detr : https://github.com/facebookresearch/detr
yolov5 : https://github.com/ultralytics/yolov5
real-esrgan : https://github.com/xinntao/Real-ESRGAN
yolov6 : https://github.com/meituan/YOLOv6

TensorRT_API
TensorRT_API copied to clipboard

Metadata

TensorRT_EX

Enviroments

Custom plugin example

Classification model

vgg11 model

resnet18 model

Semantic Segmentaion model

Object Detection model(ViT)

Object Detection model

Super-Resolution model

Object Detection model 2

Object Detection model 3 (in progress)

Using C TensoRT model in Python using dll

A typical TensorRT model creation sequence using TensorRT API

reference

← Metadata

Owner

Metadata

TensorRT_API TensorRT_API copied to clipboard

Metadata

TensorRT_EX

Enviroments

Custom plugin example

Classification model

vgg11 model

resnet18 model

Semantic Segmentaion model

Object Detection model(ViT)

Object Detection model

Super-Resolution model

Object Detection model 2

Object Detection model 3 (in progress)

Using C TensoRT model in Python using dll

A typical TensorRT model creation sequence using TensorRT API

reference

← Metadata

Owner

Metadata

TensorRT_API
TensorRT_API copied to clipboard