FastDeploy Yolo cuda preprocessing util and yolov5 cuda preprocessing

Yolo cuda preprocessing util and yolov5 cuda preprocessing

Open wang-xinyu opened this issue 1 year ago • 1 comments

PR types

Performance optimization

PR changes

Others - preprocessing

Describe

Add a YOLO CUDA preprocessing util
Yolov5: integrate CUDA preprocessing
cmake changes to support CUDA source files compile

Oct 14 '22 11:10 wang-xinyu

All committers have signed the CLA.

Oct 14 '22 11:10 CLAassistant

Latency includes preprocessing, inference and postprocessing, in milliseconds. Tested on P40, TensorRT8.4.

Model	Latency(CPU preprocessing)	Latency(CUDA preprocessing)	Optimization
yolov5s	41	28	31.7% $\downarrow$
yolov5lite	40	22	45% $\downarrow$
yolov6s	25	11	56% $\downarrow$
yolov7	47	32	31.9% $\downarrow$
yolov7_e2e	27	16	40.7% $\downarrow$

Oct 18 '22 12:10 wang-xinyu

This CUDA preprocessing for YOLO is using warp affine method to do resizing, which is slightly different from cv::resize(). Hence the mAP is slightly different. Below mAP(IoU=0.50:0.95 | area=all) results were tested on coco_val_2017, 5000 images, with TensorRT model.

Model	mAP(CPU preprocessing)	mAP(CUDA preprocessing)
yolov5s	0.372	0.368
yolov6s	0.424	0.418
yolov7	0.514	0.498

Oct 19 '22 03:10 wang-xinyu

FastDeploy FastDeploy copied to clipboard

Yolo cuda preprocessing util and yolov5 cuda preprocessing

PR types

PR changes

Describe

FastDeploy
FastDeploy copied to clipboard