FastDeploy
FastDeploy copied to clipboard
[Backend] OpenCV_CUDA preprocessors and PPClas preprocess optimization
PR types(PR类型)
Backend
Describe
- Implemented OpenCV_CUDA preprocessors: BGR2RGB, ResizeByShort, CenterCrop, Normalize, NormalizeAndPermute, HWC2CHW
- Fuse preprocessors(BGR2RGB/RGB2BGR + Normalize + HWC2CHW).
- PPClas integrated OpenCV_CUDA preprocessing
- PPClas integrated preprocessors fusion
- PPClas end2end performance testing
Experiment
PPClas, resnet50, 224x224, NV-P40/E5-2650 End2end test(preprocessing + inference + postprocessing) Use INTER_LINEAR resizing Latency in milliseconds
Optimization\Image Size | 2560x1440 | 1920x1080 | 1280x720 | 640x480 |
---|---|---|---|---|
Original | 9.72 | 8.42 | 6.24 | 5.40 |
Reuse input&output tensors | 8.83 | 7.30 | 6.09 | 5.14 |
Reuse input&output tensors and fuse BGR2RGB + Normalize + HWC2CHW | 5.48 | 5.36 | 4.67 | 4.37 |
Reuse input&output tensors and fuse BGR2RGB + Normalize + HWC2CHW and OpenCV_CUDA | 6.12 | 5.26 | 4.33 | 3.89 |
Reuse input&output tensors and fuse BGR2RGB + Normalize + HWC2CHW and OpenCV_CUDA with resize on CPU pinned memory | 4.91 | 4.36 | 4.35 | 4.11 |
1.98x | 1.93x | 1.43x | 1.31x |