yolov9
yolov9 copied to clipboard
Performance YOLOv7 vs YOLOv9-C vs YOLOv9-E over TensorRT engine
Perfomance Test using GPU RTX 4090 on AMD Ryzen 7 3700X 8-Core/ 16GB RAM.
Model Performance Evaluation using TensorRT engine using Triton-Server
All models were deployed using FP16 and size 640. This includes the YOLOv7, YOLOv9-C, and YOLOv9-E.
The tests demonstrated that the YOLOv9-C model requires some optimizations to perform better with the TensorRT engine. It had a worse performance compared to its predecessor, YOLOv7. The next steps involve profiling the models to identify bottlenecks. I'll use TensorRT Engine Explorer (TREx) for profiling and will return here with the results.
All tests can be reproduced by using this repo
Result Test: Assess the individual performance of each model on a single RTX 4090.
YOLOv7
Latency: 4ms
Throughput: 265 infer/sec
YOLOv9-C
Latency: 4.4ms
Throughput: 240 infer/sec
YOLOv9-E
Latency: 3.8ms
Throughput: 282 infer/sec
Result Test: Evaluated the global performance and discovered the maximum potential performance of the model on a single RTX 4090.
YOLOv7
Concurrency: 32
Latency: 29.1ms
Throughput: 1095 infer/sec
YOLOv9-C
Concurrency: 32
Latency: 43.9ms
Throughput: 728 infer/sec
YOLOv9-E
Concurrency: 32
Latency: 28.9ms
Throughput: 1103 infer/sec
Full Report https://github.com/levipereira/triton-server-yolo-v7-v9/tree/master/perfomance
@levipereira
Thanks for provide TRT performance reports.
I notice that you use yolov9-c.pt
for exporting and testing performance.
In actually, yolov9-c.pt
contains PGI auxiliary branch, which can be removed at inference stage.
Could you help for using yolov9-c-converted.pt
and yolov9-e-converted.pt
to get more performance reports?
Their architectures are as same as gelan-c.pt
and gelan-e.pt
respectively.
The converted weights are provided at here. yolov9-c-converted.pt yolov9-e-converted.pt
https://github.com/levipereira/yolov9/blob/main/models/experimental.py#L140
output = output[0]
will gets output of auxiliary branch.
output = output[1]
will gets output of main branch, which is correct one.
https://github.com/levipereira/yolov9/blob/main/models/experimental.py#L140
output = output[0]
will gets output of auxiliary branch.output = output[1]
will gets output of main branch, which is correct one.
https://github.com/WongKinYiu/yolov9/issues/130#issuecomment-1974964596
Perfomance Test using GPU RTX 2080Ti 22GB on AMD Ryzen 7 5700X 8-Core/ 128GB RAM.
Model Performance Evaluation using TensorRT engine using TensoRT-YOLO.
All models were deployed using FP16, BatchSize 4 and size 640.
YOLOv9 Series
This includes the YOLOv9-C, YOLOv9-E, YOLOv9-C-Converted, YOLOv9-E-Converted, GELAN-C and GELAN-E.
YOLOv9-C | YOLOv9-E | YOLOv9-C-Converted | YOLOv9-E-Converted | GELAN-C | GELAN-E |
---|---|---|---|---|---|
Average Latency: 36.615ms |
Average Latency: 59.736ms |
Average Latency: 19.689ms |
Average Latency: 53.144ms |
Average Latency: 19.557ms |
Average Latency: 53.575ms |
YOLOv8 Series
This includes the YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l and YOLOv8x.
YOLOv8n | YOLOv8s | YOLOv8m | YOLOv8l | YOLOv8x |
---|---|---|---|---|
Average Latency: 10.289ms |
Average Latency: 12.459ms |
Average Latency: 18.514ms |
Average Latency: 24.926ms |
Average Latency: 34.587ms |
Hi @WongKinYiu, I apologize for the delay in responding; my work has been taking up a lot of my time. I'm deeply involved in assessing the performance of YOLOv9. I've managed to gather some valuable performance information comparing YOLOv9 to YOLOv7. I'll be sharing these findings in the next few days. I'm sending a more detailed report and need to highlight the differences accurately.
The original post had results from many variables that shouldn't have been included in measuring the model's performance. That's why I made changes to the original post.