FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

[Model] yolo use external stream, avoid reallocating output tensors

Open wang-xinyu opened this issue 3 years ago • 1 comments

PR types(PR类型)

Model

Describe

  • Avoid reallocating output tensors
  • Use external stream when using cuda preprocessing, avoid reallocating cuda streams

wang-xinyu avatar Oct 27 '22 09:10 wang-xinyu

Yolov5s Predict() latency(P40, TRT 8.4.3.1, 640x640):

Currently, when UseCudaPreprocessing & EnablePinnedMemory 41ms

After moving output tensors to class member, which can avoid reallocating output buffers. 25ms

After using external stream, which can avoid reallocating cuda streams. still 25ms, means allocating cuda stream dosen't take long time.

wang-xinyu avatar Oct 27 '22 09:10 wang-xinyu