FastDeploy [Model] yolo use external stream, avoid reallocating output tensors

[Model] yolo use external stream, avoid reallocating output tensors

Open wang-xinyu opened this issue 3 years ago • 1 comments

Model

Avoid reallocating output tensors
Use external stream when using cuda preprocessing, avoid reallocating cuda streams

Oct 27 '22 09:10 wang-xinyu

Yolov5s Predict() latency(P40, TRT 8.4.3.1, 640x640):

Currently, when UseCudaPreprocessing & EnablePinnedMemory 41ms

After moving output tensors to class member, which can avoid reallocating output buffers. 25ms

After using external stream, which can avoid reallocating cuda streams. still 25ms, means allocating cuda stream dosen't take long time.

Oct 27 '22 09:10 wang-xinyu