yolov9 Performance YOLOv7 vs YOLOv9-C vs YOLOv9-E over TensorRT engine

Performance YOLOv7 vs YOLOv9-C vs YOLOv9-E over TensorRT engine

Open levipereira opened this issue 4 months ago • 3 comments

Perfomance Test using GPU RTX 4090 on AMD Ryzen 7 3700X 8-Core/ 16GB RAM.

Model Performance Evaluation using TensorRT engine using Triton-Server

All models were deployed using FP16 and size 640. This includes the YOLOv7, YOLOv9-C, and YOLOv9-E.

The tests demonstrated that the YOLOv9-C model requires some optimizations to perform better with the TensorRT engine. It had a worse performance compared to its predecessor, YOLOv7. The next steps involve profiling the models to identify bottlenecks. I'll use TensorRT Engine Explorer (TREx) for profiling and will return here with the results.

All tests can be reproduced by using this repo

Result Test: Assess the individual performance of each model on a single RTX 4090. YOLOv7 Latency: 4ms Throughput: 265 infer/sec

YOLOv9-C Latency: 4.4ms Throughput: 240 infer/sec

YOLOv9-E Latency: 3.8ms Throughput: 282 infer/sec

Result Test: Evaluated the global performance and discovered the maximum potential performance of the model on a single RTX 4090. YOLOv7 Concurrency: 32 Latency: 29.1ms Throughput: 1095 infer/sec

YOLOv9-C Concurrency: 32 Latency: 43.9ms Throughput: 728 infer/sec

YOLOv9-E Concurrency: 32 Latency: 28.9ms Throughput: 1103 infer/sec

Full Report https://github.com/levipereira/triton-server-yolo-v7-v9/tree/master/perfomance

Mar 01 '24 18:03 levipereira

@levipereira Thanks for provide TRT performance reports. I notice that you use yolov9-c.pt for exporting and testing performance. In actually, yolov9-c.pt contains PGI auxiliary branch, which can be removed at inference stage. Could you help for using yolov9-c-converted.pt and yolov9-e-converted.pt to get more performance reports? Their architectures are as same as gelan-c.pt and gelan-e.pt respectively.

The converted weights are provided at here. yolov9-c-converted.pt yolov9-e-converted.pt

Mar 01 '24 23:03 WongKinYiu

https://github.com/levipereira/yolov9/blob/main/models/experimental.py#L140

output = output[0] will gets output of auxiliary branch. output = output[1] will gets output of main branch, which is correct one.

Mar 02 '24 12:03 WongKinYiu

https://github.com/levipereira/yolov9/blob/main/models/experimental.py#L140

output = output[0] will gets output of auxiliary branch. output = output[1] will gets output of main branch, which is correct one.

https://github.com/WongKinYiu/yolov9/issues/130#issuecomment-1974964596

Mar 03 '24 01:03 levipereira

Perfomance Test using GPU RTX 2080Ti 22GB on AMD Ryzen 7 5700X 8-Core/ 128GB RAM.

Model Performance Evaluation using TensorRT engine using TensoRT-YOLO.

All models were deployed using FP16, BatchSize 4 and size 640.

YOLOv9 Series

This includes the YOLOv9-C, YOLOv9-E, YOLOv9-C-Converted, YOLOv9-E-Converted, GELAN-C and GELAN-E.

YOLOv9-C	YOLOv9-E	YOLOv9-C-Converted	YOLOv9-E-Converted	GELAN-C	GELAN-E
Average Latency: `36.615ms`	Average Latency: `59.736ms`	Average Latency: `19.689ms`	Average Latency: `53.144ms`	Average Latency: `19.557ms`	Average Latency: `53.575ms`

YOLOv8 Series

This includes the YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l and YOLOv8x.

YOLOv8n	YOLOv8s	YOLOv8m	YOLOv8l	YOLOv8x
Average Latency: `10.289ms`	Average Latency: `12.459ms`	Average Latency: `18.514ms`	Average Latency: `24.926ms`	Average Latency: `34.587ms`

Mar 03 '24 05:03 laugh12321

Hi @WongKinYiu, I apologize for the delay in responding; my work has been taking up a lot of my time. I'm deeply involved in assessing the performance of YOLOv9. I've managed to gather some valuable performance information comparing YOLOv9 to YOLOv7. I'll be sharing these findings in the next few days. I'm sending a more detailed report and need to highlight the differences accurately.

The original post had results from many variables that shouldn't have been included in measuring the model's performance. That's why I made changes to the original post.

Mar 04 '24 01:03 levipereira

yolov9 yolov9 copied to clipboard

Performance YOLOv7 vs YOLOv9-C vs YOLOv9-E over TensorRT engine

Model Performance Evaluation using TensorRT engine using Triton-Server

YOLOv9 Series

YOLOv8 Series

yolov9
yolov9 copied to clipboard