darknet
darknet copied to clipboard
Comparison of some models on CPU vs VPU (neurochip) vs GPU
-
batch=1 (sync-mode)
- CPU, VPU
- OpenCV 4.2.0 (master-branch 21 Mar 2020)
- OpenVINO 2020.1.033
- GPU
- CUDA 10.0
- cuDNN 7.4.2
- Darknet (Mar 22, 2020) GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1
Accuracy and FPS:
Model | AP50...95 (MSCOCO), accuracy | mAP50 (MSCOCO), accuracy | CPU - 90 Watt - FP32 (Intel Core i7-6700K 4GHz 8 Logical Cores) OpenCV-DLIE, FPS | VPU - 2 Watt - FP16 (Intel Myriad X) OpenCV-DLIE, FPS | GPU - 175 Watt - FP32/16 (nVidia GeForce RTX 2070) Darknet-cuDNN, FPS |
---|---|---|---|---|---|
yolov4-tiny 416x416 | 40.2% | - | - | 330 | |
yolov3-tiny 416x416 | 33.1% | 35 | 6.5 | 340 | |
yolov3-tiny-PRN 416x416 | 33.1% | 46 | 5.3 | 370 | |
EfficientNetB0-Yolo 416x416 | 45.5% | 11 | - | 55 | |
yolov3 416x416 | 31.0% | 55.3% | - | - | - |
yolov3-spp 512x512 | ~59.6% | 3.3 | 1.1 | 52 | |
csresnext50-opt 512x512 | 42.4% | 64.4% | 3.5 | 0.64 | 37 |
csdarknet53-opt 256x256 async=3 | 33.3% | 53.0% | 14 | 11 | 74 |
csdarknet53-opt 512x512 | 42.4% | 64.5% | 3.5 | 1.23 | 50 |
csdarknet53-mish 512x512 (YOLOv4) | 43.0% | 64.9% | - | - | 50 |
csresnext50-opt 608x608 | 43.2% | 65.4% | - | - | 34 |
csdarknet53-mish 608x608 (YOLOv4) | 43.5% | 65.7% | - | - | 37 |
@AlexeyAB Hello,
So currently EfficientNetB0-Yolo is the fastest model on VPU?
@WongKinYiu Hi,
Yes, it seems VPU (Intel Myriad X) is highly optimized for Grouped-convolutional and may be SE-blocks. I will test it more.
Maybe with new Google-Coral-TPU-edge in general, the performance ratio will be the same as with Intel Myriad X.
So maybe it makes sense to train GhostNet ghostnet.cfg.txt and yolov3-tiny-3l-ghostnet
(as a new tiny-yolo model): https://github.com/AlexeyAB/darknet/issues/4418#issue-530577441
@AlexeyAB Thanks,
ghostnet now training 40k/800k iterations.
@WongKinYiu Do you train ghostnet with CutMix+Mosaic+Label-smoothing?
Also did we get improvement for any network with DropBlock?
This is a fantastic resource, if at all possible, it'd be great to also see results for "batch=4" or similar.
@AlexeyAB No, just ghostnet.cfg.txt your provided before.
@WongKinYiu I also added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg that you can try to train with subdivisions=6
or 4
@AlexeyAB thanks, i am seeing the code of new commits.
@AlexeyAB i set subdivisions=4
and the training is start now.
Hi @AlexeyAB When you test CPU, VPU , do you use FP32? As far as I know, VPU can use FP16 and Int8. this information is very important.
@ShaneHsieh I added this information, so CPU uses FP32, VPU uses FP16, GPU uses FP32/16 (Tensor Cores). These devices use the lowest possible precision of floating point values with increasing speed and without loss of accuracy.
Thank. Compare CPU and GPU when use FP32 , CPU use EfficientNetB0-Yolo can get better performance. it is good information.
what does the opencv-DLIE mean?
OpenCV-DLIE (deep learning Inference Engine), supported by OpenVINO Toolkit.
Yes, you can use opencv dnn module to run the models. For example, yolov3, yolov3-tiny-prn, efficientnetb0-yolo...
But due to mish activation function and eliminate grid sensitivity not yet supported by opencv dnn module, you can not run yolov4 in this time.
Does it support alexeyAB's version ?, I now only find the tensorflow's yolo version that OpenVINO support.
for your reference https://github.com/opencv/opencv/pull/16436
will you please give me a tutorial of how to deploy the cfg file to xml which OpenVINO supports? I see the question on the site Does OpenCV-OpenVINO version supports Yolo v3 network? It may be asked by alexeyAB.
Darknet is supported already. https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV
@andeyeluguo For using Yolo with OpenVINO (on CPU, GPU, VPU, ...) you should
- install OpenVINO as usual
- install OpenCV with OpenVINO-backend: https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend
- run yolov3.cfg + yolov3.weights by using OpenCV-dnn https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html examples how to use Yolo
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py
YOLOv4 will be supported for OpenCV+OpenVINO soon: https://github.com/opencv/opencv/issues/17148
I added Yolo v2 to OpenCV 2.5 years ago: https://github.com/opencv/opencv/pull/9705
Can these models also be run on NCS 2 using the OpenCV DNN module with IE backend?
@mmaaz60 it seems like that is the case. We will be trying on DepthAI (Myriad X based) shortly and will circle back.
Also @AlexeyAB if you have any instructions on how to use YOLOv4 on VPU, we'd be keen to try them out on DepthAI.
@Luxonis-Brandon
Current version of YOLOv4 is for Real-time on GPU. Later we will release YOLOv4-VPU for real-time >= 30 FPS on VPU.
There are two ways to run YOLOv4 on MyriadX:
- Support for YOLOv4 in OpenVINO - Wait until it is added to OpenVINO
- Support for YOLOv4 in OpenCV-dnn (with OpenVINO IE-backend ) - wait for solving this issue: https://github.com/opencv/opencv/issues/17148
Right now, you can try to use a slightly simpler version of YOLOv4, which is 0.5% worse on VPU Intel MyriadX by using C++ with OpenVINO:
- or (width=512 height=512 in cfg with accuracy 42.4% AP and speed 1.2 FPS) look at the table https://github.com/AlexeyAB/darknet/issues/5079#issue-585403577
- or (width=320 height=320 in cfg 40.5% AP and 3 FPS)
- or (width=320 height=320 in cfg 40.5% AP and ~7 FPS with async=3 streams)
use
- cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
- weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
- example: https://github.com/opencv/open_model_zoo/tree/master/demos/object_detection_demo_yolov3_async
- just change anchors https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L118-L119 to these values: https://github.com/AlexeyAB/darknet/blob/36c73c5b9e3f2e72049fb68566e32632f6c70e85/cfg/yolov4.cfg#L1141
- instead of this code: https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L196-L197 use this code
// actually should be 1.05, 1.1 and 1.2 for correspond [yolo] layers istead of 1.1
double x = (col + output_blob[box_index + 0 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_w;
double y = (row + output_blob[box_index + 1 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_h;
@Luxonis-Brandon
I just tested csdarknet53-opt
(YOLOv4 without MISH in cfg set: width=256 height=256
- 33.3% AP | 53.0% AP50) on your DepthAI (Myriad X) device with network resolution 256x256 and async=3 by using OpenCV (OpenVINO IE-backend) and get 11 FPS.
- cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
- weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
- source:
- C++: https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
- Python: https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py
OpenCV 4.4.0-pre compiled by self. OpenVino 2020.R3, Myriad.
net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)
Input 416x416
efficient-b0 395 ms yolov3, 550 ms yolov3-tiny-prn, 168 ms yolov3-tiny, 128 ms yolov4, 940 ms efnet-coco, 395 ms
YOLOv4-tiny released: https://github.com/AlexeyAB/darknet/issues/6067
Hi, Who has efficientnet-lite3.weights file, can you share it with me?