yolov9 icon indicating copy to clipboard operation
yolov9 copied to clipboard

速度的对比呢

Open dhddxdhd opened this issue 10 months ago • 5 comments

最重要的就是速度对比啊,不比较速度就是耍流氓

dhddxdhd avatar Apr 17 '24 10:04 dhddxdhd

Based on TensorRT-YOLO.

2080ti, FP16, BatchSize 4, size 640

YOLOv9

YOLOv9-C
Average Latency: 19.689ms

YOLOv8

YOLOv8m YOLOv8l YOLOv8x
Average Latency: 18.514ms Average Latency: 24.926ms Average Latency: 34.587ms

WongKinYiu avatar Apr 18 '24 05:04 WongKinYiu

Based on TensorRT-YOLO.

2080ti, FP16, BatchSize 4, size 640

YOLOv9

YOLOv9-C Average Latency: 19.689ms

YOLOv8

YOLOv8m YOLOv8l YOLOv8x Average Latency: 18.514ms Average Latency: 24.926ms Average Latency: 34.587ms

对比图呢,搞不懂参数了,计算量有什么好对比的,最核心的速度对比,论文中竟然不提??????????

dhddxdhd avatar Apr 18 '24 06:04 dhddxdhd

Based on YOLOv6 3.0.

T4, TRT7: YOLOv7 13% faster than YOLOv6 3.0. T4, TRT8: YOLOv7 20% faster than YOLOv6 3.0. V100: YOLOv7 33% faster than YOLOv6 3.0.

Inference speed will depend on which devices (V100 vs T4) and tools (TRT8 vs TRT7) you used. And some of recent works do not provide tools' information to make fair comparison on inference speed. If we used similar computational blocks, we can believe that FLOPs is more objective.

WongKinYiu avatar Apr 18 '24 06:04 WongKinYiu

Based on YOLOv6 3.0.

T4, TRT7: YOLOv7 13% faster than YOLOv6 3.0. T4, TRT8: YOLOv7 20% faster than YOLOv6 3.0. V100: YOLOv7 33% faster than YOLOv6 3.0.

Inference speed will depend on which devices (V100 vs T4) and tools (TRT8 vs TRT7) you used. And some of recent works do not provide tools' information to make fair comparison on inference speed. If we used similar computational blocks, we can believe that FLOPs is more objective.

感谢大佬的详细回复,我非常赞同您的观点,我对我一开始的无知言论感到惭愧。不过我还是很期待后续速度的对比。

dhddxdhd avatar Apr 18 '24 07:04 dhddxdhd

速度精度超过yolo8,我再考虑学习下v9,我也是觉着论文中连速度对比图都没有太离谱了。

ocrhei avatar Apr 21 '24 16:04 ocrhei

再次回顾v9,还是觉着太垃圾了,根本配不上这个名字。远不如刚发布的v10,从v10发布的版本可以看出来v9的速度很拉胯,难怪作者不敢放出来速度对比图

dhddxdhd avatar May 24 '24 18:05 dhddxdhd

實際上是NMS API的問題, 表中可以看到v9 model inference比v8快了24%. 而NMS居然占了整體inference的42%時間. 使用正常的NMS API大概只會用到0.4 ms. 另外不知道為什麼mAP列了52.5.

也就是 YOLOv6-L/52.9/?/8.1 YOLOv9-C/53.0/6.5/6.1 YOLOv10-B/52.5/5.7/5.7 YOLOv10-L/53.2/7.3/7.2

image

WongKinYiu avatar May 25 '24 00:05 WongKinYiu

@dhddxdhd

我找到可能的原因了

推測YOLOv10在算YOLOv9速度時把所有prediction都放進去算NMS 實際上YOLOv9只會把score高於threshold的prediction拿去算NMS

YOLOv10在算YOLOv9速度時可能拿全部(6400+1600+400)個prediction算NMS YOLOv9在COCO平均過threshold的數量是300左右

NMS的best case 是 O(N), worst case是O(N^2) 因此導致了YOLOv10估的YOLOv9 NMS是4.44 ms 而實際上YOLOv9的平均 NMS時間 < 0.4 ms 即使不使用TRT的NMS加速 YOLOv9的平均 NMS時間也 < 1 ms

WongKinYiu avatar May 25 '24 13:05 WongKinYiu

@WongKinYiu

很好奇為什麼得要等論文被接受後才把 tiny 的架構給放出來?

twmht avatar May 29 '24 04:05 twmht

合作單位希望小型模型明年釋出. 我們目前爭取到的最早釋出時間是論文接受. 最近有鬆口中型模型可以提早. 我這個月底會把中型模型先釋出. 持續溝通希望能夠更快釋出所有模型中.

WongKinYiu avatar May 29 '24 05:05 WongKinYiu