darknet icon indicating copy to clipboard operation
darknet copied to clipboard

EfficientNetb0-Yolo speed slow

Open Code-Fight opened this issue 5 years ago • 19 comments

Hi @AlexeyAB , Thanks for you work.

I try to run EfficientNetb0-Yolo on 1080ti,But it only 10 fps.

The EfficientNetb0-Yolo only 3.7 BFLOPs,why it runing so slow?

I would like to increase the inference speed,what should I do?

Thank you very much.

Code-Fight avatar Dec 04 '19 07:12 Code-Fight

depthwise convolution models like efficientnet are very slow on GPU. If you need GPU realtime, consider yolov3-spp or csresnext50

LukeAI avatar Dec 04 '19 11:12 LukeAI

Hi, @LukeAI
I got it ,thank you very much Can I running on CPU realtime? I try to run on CPU by openCV ,but it faild. What should I do? Thanks

Code-Fight avatar Dec 04 '19 12:12 Code-Fight

Ask there, when will be added EfficientNet-Yolo to the OpenCV-dnn https://github.com/opencv/opencv/issues/15987

At least there were added Swish and Mish activations: https://github.com/opencv/opencv/pull/15808

AlexeyAB avatar Dec 04 '19 13:12 AlexeyAB

Hi @AlexeyAB Thanks for you reply. I will follow it

Code-Fight avatar Dec 04 '19 13:12 Code-Fight

@Code-Fight @AlexeyAB

Our YOLOv3-tiny-PRN can run 48 fps on i7-8750H CPU and 71 fps on i9-9900K, respectively, using opencv dnn module (80 classes). https://github.com/opencv/opencv/pull/15739 Maybe you can also considerate about it before efficientnet-YOLO-PRN is supported by opencv.

WongKinYiu avatar Dec 04 '19 17:12 WongKinYiu

Hi @WongKinYiu , Thank for your work for "YOLOv3-tiny-PRN". I hava used this model and it work well. Thanks again.

Code-Fight avatar Dec 05 '19 00:12 Code-Fight

@LukeAI @WongKinYiu @AlexeyAB I found that both of efficientnet and cspresnext50 use groups in their .cfg like following:

[convolutional]
batch_normalize=1
filters=128
size=3
groups=32
stride=1
pad=1
activation=leaky

Since comment like https://github.com/AlexeyAB/darknet/issues/4447#issuecomment-561611786

depthwise convolution models like efficientnet are very slow on GPU

Why can cspresnext50 achieve real-time on GPU? 图片

nyj-ocean avatar Dec 06 '19 08:12 nyj-ocean

Because it uses group convolution, not depth-wise convolution.

groups=1 is equals to vanilla convolution, its base is equal to k. groups=k is equals to depth-wise convolution, its base is equal to 1. In ResNeXt50, groups=32, the base is 4~32, so its speed is between above two groups setting.

CSPResNeXt50 can achieve real-time on GPU is because it has 40% less computation than YOLOv3.

Also you can see EfficientNet-EdgeTPU, even though it has 7 times computation then EfficientNet, it run faster than EfficientNet. image So if you want get real-time performance on mobile GPU using depth-wise convolution, maybe you have to make its computation lower to 0.4 BFLOPs, for example, ThunderNet.

WongKinYiu avatar Dec 06 '19 09:12 WongKinYiu

@WongKinYiu Your so aweasome that i don't have words. Fabulous work with opencv-integration.

ashuezy avatar Dec 06 '19 11:12 ashuezy

@WongKinYiu Thanks a lot. That lets me learn more about group convolution and depth-wise convolution.

nyj-ocean avatar Dec 07 '19 01:12 nyj-ocean

@Code-Fight @AlexeyAB

Our YOLOv3-tiny-PRN can run 48 fps on i7-8750H CPU and 71 fps on i9-9900K, respectively, using opencv dnn module (80 classes). opencv/opencv#15739 Maybe you can also considerate about it before efficientnet-YOLO-PRN is supported by opencv.

Hi @WongKinYiu

I ran the speed test of this network on the Intel CPU. It looks like it is almost 5 times slower than the Tiny Yolov3 PRN network on CPU as well. Below are the results,

OpenCV: 3.4.10-pre (https://github.com/opencv/opencv/tree/377dd04224630e835cce8c7d67e651cae73fd3b3) CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30 GHz Hard Drive Type: HDD Display: Off Yolov3-Tiny-PRN: 21.62 FPS EfficientNetB0-Yolov3: 4.72 FPS

It looks like depth wise convolutions are slow on CPU as well. Any thoughts?

Thanks

mmaaz60 avatar Mar 21 '20 10:03 mmaaz60

Take a look at the comparison: https://github.com/AlexeyAB/darknet/issues/5079

AlexeyAB avatar Mar 21 '20 11:03 AlexeyAB

also you can use efficientnet-lite instead. https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite

WongKinYiu avatar Mar 21 '20 15:03 WongKinYiu

It would be interesting to implement efficientnet-lite-yolov3 in this repo.

mmaaz60 avatar Mar 21 '20 16:03 mmaaz60

I added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg

AlexeyAB avatar Mar 23 '20 23:03 AlexeyAB

I added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg

@AlexeyAB Where can we see the performance?

becauseofAI avatar Mar 24 '20 04:03 becauseofAI

@AlexeyAB

efficientnet-lite3.cfg: top-1 69.1%, top-5 89.6%.

WongKinYiu avatar Apr 30 '20 06:04 WongKinYiu

@AlexeyAB

efficientnet-lite3-leaky.cfg: top-1 73.0%, top-5 92.4%.

WongKinYiu avatar Jun 09 '20 01:06 WongKinYiu

@AlexeyAB @WongKinYiu

Hi, Who has efficientnet-lite3.weights file, can you share it with me?

linyib avatar Mar 12 '24 08:03 linyib