FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

[Backend] OpenCV_CUDA preprocessors and PPClas preprocess optimization

Open wang-xinyu opened this issue 1 year ago • 0 comments

PR types(PR类型)

Backend

Describe

  • Implemented OpenCV_CUDA preprocessors: BGR2RGB, ResizeByShort, CenterCrop, Normalize, NormalizeAndPermute, HWC2CHW
  • Fuse preprocessors(BGR2RGB/RGB2BGR + Normalize + HWC2CHW).
  • PPClas integrated OpenCV_CUDA preprocessing
  • PPClas integrated preprocessors fusion
  • PPClas end2end performance testing

Experiment

PPClas, resnet50, 224x224, NV-P40/E5-2650 End2end test(preprocessing + inference + postprocessing) Use INTER_LINEAR resizing Latency in milliseconds

Optimization\Image Size 2560x1440 1920x1080 1280x720 640x480
Original 9.72 8.42 6.24 5.40
Reuse input&output tensors 8.83 7.30 6.09 5.14
Reuse input&output tensors and fuse BGR2RGB + Normalize + HWC2CHW 5.48 5.36 4.67 4.37
Reuse input&output tensors and fuse BGR2RGB + Normalize + HWC2CHW and OpenCV_CUDA 6.12 5.26 4.33 3.89
Reuse input&output tensors and fuse BGR2RGB + Normalize + HWC2CHW and OpenCV_CUDA with resize on CPU pinned memory 4.91 4.36 4.35 4.11
1.98x 1.93x 1.43x 1.31x

wang-xinyu avatar Nov 03 '22 09:11 wang-xinyu