darknet
darknet copied to clipboard
EfficientNetb0-Yolo speed slow
Hi @AlexeyAB , Thanks for you work.
I try to run EfficientNetb0-Yolo on 1080ti,But it only 10 fps.
The EfficientNetb0-Yolo only 3.7 BFLOPs,why it runing so slow?
I would like to increase the inference speed,what should I do?
Thank you very much.
depthwise convolution models like efficientnet are very slow on GPU. If you need GPU realtime, consider yolov3-spp or csresnext50
Hi, @LukeAI
I got it ,thank you very much
Can I running on CPU realtime?
I try to run on CPU by openCV ,but it faild.
What should I do?
Thanks
Ask there, when will be added EfficientNet-Yolo to the OpenCV-dnn https://github.com/opencv/opencv/issues/15987
At least there were added Swish and Mish activations: https://github.com/opencv/opencv/pull/15808
Hi @AlexeyAB Thanks for you reply. I will follow it
@Code-Fight @AlexeyAB
Our YOLOv3-tiny-PRN can run 48 fps on i7-8750H CPU and 71 fps on i9-9900K, respectively, using opencv dnn module (80 classes). https://github.com/opencv/opencv/pull/15739 Maybe you can also considerate about it before efficientnet-YOLO-PRN is supported by opencv.
Hi @WongKinYiu , Thank for your work for "YOLOv3-tiny-PRN". I hava used this model and it work well. Thanks again.
@LukeAI @WongKinYiu @AlexeyAB
I found that both of efficientnet
and cspresnext50
use groups
in their .cfg
like following:
[convolutional]
batch_normalize=1
filters=128
size=3
groups=32
stride=1
pad=1
activation=leaky
Since comment like https://github.com/AlexeyAB/darknet/issues/4447#issuecomment-561611786
depthwise convolution models like efficientnet are very slow on GPU
Why can cspresnext50
achieve real-time on GPU?
Because it uses group convolution, not depth-wise convolution.
groups=1 is equals to vanilla convolution, its base is equal to k. groups=k is equals to depth-wise convolution, its base is equal to 1. In ResNeXt50, groups=32, the base is 4~32, so its speed is between above two groups setting.
CSPResNeXt50 can achieve real-time on GPU is because it has 40% less computation than YOLOv3.
Also you can see EfficientNet-EdgeTPU, even though it has 7 times computation then EfficientNet, it run faster than EfficientNet.
So if you want get real-time performance on mobile GPU using depth-wise convolution, maybe you have to make its computation lower to 0.4 BFLOPs, for example, ThunderNet.
@WongKinYiu Your so aweasome that i don't have words. Fabulous work with opencv-integration.
@WongKinYiu
Thanks a lot.
That lets me learn more about group convolution
and depth-wise convolution
.
@Code-Fight @AlexeyAB
Our YOLOv3-tiny-PRN can run 48 fps on i7-8750H CPU and 71 fps on i9-9900K, respectively, using opencv dnn module (80 classes). opencv/opencv#15739 Maybe you can also considerate about it before efficientnet-YOLO-PRN is supported by opencv.
Hi @WongKinYiu
I ran the speed test of this network on the Intel CPU. It looks like it is almost 5 times slower than the Tiny Yolov3 PRN network on CPU as well. Below are the results,
OpenCV: 3.4.10-pre (https://github.com/opencv/opencv/tree/377dd04224630e835cce8c7d67e651cae73fd3b3) CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30 GHz Hard Drive Type: HDD Display: Off Yolov3-Tiny-PRN: 21.62 FPS EfficientNetB0-Yolov3: 4.72 FPS
It looks like depth wise convolutions are slow on CPU as well. Any thoughts?
Thanks
Take a look at the comparison: https://github.com/AlexeyAB/darknet/issues/5079
also you can use efficientnet-lite instead. https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite
It would be interesting to implement efficientnet-lite-yolov3 in this repo.
I added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg
I added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg
@AlexeyAB Where can we see the performance?
@AlexeyAB
efficientnet-lite3.cfg: top-1 69.1%, top-5 89.6%.
@AlexeyAB
efficientnet-lite3-leaky.cfg: top-1 73.0%, top-5 92.4%.
@AlexeyAB @WongKinYiu
Hi, Who has efficientnet-lite3.weights file, can you share it with me?