FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

Yolo cuda preprocessing util and yolov5 cuda preprocessing

Open wang-xinyu opened this issue 1 year ago • 1 comments

PR types

Performance optimization

PR changes

Others - preprocessing

Describe

  • Add a YOLO CUDA preprocessing util
  • Yolov5: integrate CUDA preprocessing
  • cmake changes to support CUDA source files compile

wang-xinyu avatar Oct 14 '22 11:10 wang-xinyu

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 14 '22 11:10 CLAassistant

Latency includes preprocessing, inference and postprocessing, in milliseconds. Tested on P40, TensorRT8.4.

Model Latency(CPU preprocessing) Latency(CUDA preprocessing) Optimization
yolov5s 41 28 31.7% $\downarrow$
yolov5lite 40 22 45% $\downarrow$
yolov6s 25 11 56% $\downarrow$
yolov7 47 32 31.9% $\downarrow$
yolov7_e2e 27 16 40.7% $\downarrow$

wang-xinyu avatar Oct 18 '22 12:10 wang-xinyu

This CUDA preprocessing for YOLO is using warp affine method to do resizing, which is slightly different from cv::resize(). Hence the mAP is slightly different. Below mAP(IoU=0.50:0.95 | area=all) results were tested on coco_val_2017, 5000 images, with TensorRT model.

Model mAP(CPU preprocessing) mAP(CUDA preprocessing)
yolov5s 0.372 0.368
yolov6s 0.424 0.418
yolov7 0.514 0.498

wang-xinyu avatar Oct 19 '22 03:10 wang-xinyu