FastDeploy
FastDeploy copied to clipboard
[Model] yolo use external stream, avoid reallocating output tensors
PR types(PR类型)
Model
Describe
- Avoid reallocating output tensors
- Use external stream when using cuda preprocessing, avoid reallocating cuda streams
Yolov5s Predict() latency(P40, TRT 8.4.3.1, 640x640):
Currently, when UseCudaPreprocessing & EnablePinnedMemory 41ms
After moving output tensors to class member, which can avoid reallocating output buffers. 25ms
After using external stream, which can avoid reallocating cuda streams. still 25ms, means allocating cuda stream dosen't take long time.