FastDeploy
FastDeploy copied to clipboard
[Backend] Enable TensorRT BatchedNMSDynamic_TRT plugin
PR types(PR类型)
TensorRT后端
Describe
- 移除原有PaddleDetection模型部署的Trick逻辑,改为使用TensorRT BatchedNMSDynamic_TRT插件(EfficientNMS_TRT无法对齐所有PaddleDetection检测模型结果)
新的PR对精度和性能的影响
模型 | 后端 | 精度 | 每个样本Runtime用时 | 每个样本端到端用时 | |
---|---|---|---|---|---|
PP-YOLOE-L | PP-TensorRT | FP32 | 51.4% | 56.91ms | 64.29ms |
PP-YOLOE-L | PP-TensorRT | FP16 | 46.76ms | 52.69ms | |
PP-YOLOE-L(v0.4) | TensorRT | FP32 | 51.4% | 10.99ms | 67.36ms |
PP-YOLOE-L(v0.4) | TensorRT | FP16 | 5.44ms | 44.14ms | |
PP-YOLOE-L(dev) | TensorRT | FP32 | 51.4% | 14.23ms | 18.46ms |
PP-YOLOE-L(dev) | TensorRT | FP16 | 7.83ms | 12.11ms | |
YOLOv3-Dark53 | PP-TensorRT | FP32 | 14.41ms | 19.08ms | |
YOLOv3-Dark53 | PP-TensorRT | FP16 | 10.89ms | 16.43ms | |
YOLOv3-Dark53(0.4) | TensorRT | FP32 | 10.99ms | 19.22ms | |
YOLOv3-Dark53(0.4) | TensorRT | FP16 | 6.30ms | 13.52ms | |
YOLOv3-Dark53(dev) | TensorRT | FP32 | 10.7831ms | 15.20ms | |
YOLOv3-Dark53(dev) | TensorRT | FP16 | 5.03ms | 9.38ms |
注意:0.4版本中,在跑TensorRT时,检测模型中的NMS被自动从模型拆分出来,放到了后处理。因此会存在0.4版本的Runtime时间快,但端到端总时长慢的现象