darknet Comparison of some models on CPU vs VPU (neurochip) vs GPU

batch=1 (sync-mode)
CPU, VPU
- OpenCV 4.2.0 (master-branch 21 Mar 2020)
- OpenVINO 2020.1.033
GPU
- CUDA 10.0
- cuDNN 7.4.2
- Darknet (Mar 22, 2020) GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1

Accuracy and FPS:

Model	AP50...95 (MSCOCO), accuracy	mAP50 (MSCOCO), accuracy	CPU - 90 Watt - FP32 (Intel Core i7-6700K 4GHz 8 Logical Cores) OpenCV-DLIE, FPS	VPU - 2 Watt - FP16 (Intel Myriad X) OpenCV-DLIE, FPS	GPU - 175 Watt - FP32/16 (nVidia GeForce RTX 2070) Darknet-cuDNN, FPS
yolov4-tiny 416x416		40.2%	-	-	330
yolov3-tiny 416x416		33.1%	35	6.5	340
yolov3-tiny-PRN 416x416		33.1%	46	5.3	370
EfficientNetB0-Yolo 416x416		45.5%	11	-	55
yolov3 416x416	31.0%	55.3%	-	-	-
yolov3-spp 512x512		~59.6%	3.3	1.1	52
csresnext50-opt 512x512	42.4%	64.4%	3.5	0.64	37
csdarknet53-opt 256x256 async=3	33.3%	53.0%	14	11	74
csdarknet53-opt 512x512	42.4%	64.5%	3.5	1.23	50
csdarknet53-mish 512x512 (YOLOv4)	43.0%	64.9%	-	-	50
csresnext50-opt 608x608	43.2%	65.4%	-	-	34
csdarknet53-mish 608x608 (YOLOv4)	43.5%	65.7%	-	-	37

Mar 21 '20 01:03 AlexeyAB

@AlexeyAB Hello,

So currently EfficientNetB0-Yolo is the fastest model on VPU?

Mar 21 '20 01:03 WongKinYiu

@WongKinYiu Hi,

Yes, it seems VPU (Intel Myriad X) is highly optimized for Grouped-convolutional and may be SE-blocks. I will test it more.

Maybe with new Google-Coral-TPU-edge in general, the performance ratio will be the same as with Intel Myriad X.

So maybe it makes sense to train GhostNet ghostnet.cfg.txt and yolov3-tiny-3l-ghostnet (as a new tiny-yolo model): https://github.com/AlexeyAB/darknet/issues/4418#issue-530577441

Mar 21 '20 02:03 AlexeyAB

@AlexeyAB Thanks,

ghostnet now training 40k/800k iterations.

Mar 21 '20 02:03 WongKinYiu

@WongKinYiu Do you train ghostnet with CutMix+Mosaic+Label-smoothing?

Also did we get improvement for any network with DropBlock?

Mar 23 '20 18:03 AlexeyAB

This is a fantastic resource, if at all possible, it'd be great to also see results for "batch=4" or similar.

Mar 23 '20 19:03 LukeAI

@AlexeyAB No, just ghostnet.cfg.txt your provided before.

Mar 23 '20 22:03 WongKinYiu

@WongKinYiu I also added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg that you can try to train with subdivisions=6 or 4

Mar 23 '20 23:03 AlexeyAB

@AlexeyAB thanks, i am seeing the code of new commits.

Mar 23 '20 23:03 WongKinYiu

@AlexeyAB i set subdivisions=4 and the training is start now.

Mar 25 '20 04:03 WongKinYiu

Hi @AlexeyAB When you test CPU, VPU , do you use FP32? As far as I know, VPU can use FP16 and Int8. this information is very important.

Mar 27 '20 05:03 ShaneHsieh

@ShaneHsieh I added this information, so CPU uses FP32, VPU uses FP16, GPU uses FP32/16 (Tensor Cores). These devices use the lowest possible precision of floating point values with increasing speed and without loss of accuracy.

Mar 27 '20 13:03 AlexeyAB

Thank. Compare CPU and GPU when use FP32 , CPU use EfficientNetB0-Yolo can get better performance. it is good information.

Mar 30 '20 03:03 ShaneHsieh

what does the opencv-DLIE mean?

Apr 26 '20 06:04 andeyeluguo

OpenCV-DLIE (deep learning Inference Engine), supported by OpenVINO Toolkit.

Apr 26 '20 06:04 WongKinYiu

Yes, you can use opencv dnn module to run the models. For example, yolov3, yolov3-tiny-prn, efficientnetb0-yolo...

But due to mish activation function and eliminate grid sensitivity not yet supported by opencv dnn module, you can not run yolov4 in this time.

Apr 26 '20 07:04 WongKinYiu

Does it support alexeyAB's version ?, I now only find the tensorflow's yolo version that OpenVINO support.

Apr 26 '20 07:04 andeyeluguo

for your reference https://github.com/opencv/opencv/pull/16436

Apr 26 '20 07:04 WongKinYiu

will you please give me a tutorial of how to deploy the cfg file to xml which OpenVINO supports? I see the question on the site Does OpenCV-OpenVINO version supports Yolo v3 network? It may be asked by alexeyAB.

Apr 26 '20 08:04 andeyeluguo

Darknet is supported already. https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV

Apr 26 '20 08:04 WongKinYiu

@andeyeluguo For using Yolo with OpenVINO (on CPU, GPU, VPU, ...) you should

install OpenVINO as usual
install OpenCV with OpenVINO-backend: https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend
run yolov3.cfg + yolov3.weights by using OpenCV-dnn https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html examples how to use Yolo
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py

YOLOv4 will be supported for OpenCV+OpenVINO soon: https://github.com/opencv/opencv/issues/17148

I added Yolo v2 to OpenCV 2.5 years ago: https://github.com/opencv/opencv/pull/9705

Apr 26 '20 10:04 AlexeyAB

Can these models also be run on NCS 2 using the OpenCV DNN module with IE backend?

Apr 27 '20 21:04 mmaaz60

@mmaaz60 it seems like that is the case. We will be trying on DepthAI (Myriad X based) shortly and will circle back.

Also @AlexeyAB if you have any instructions on how to use YOLOv4 on VPU, we'd be keen to try them out on DepthAI.

Apr 28 '20 16:04 Luxonis-Brandon

@Luxonis-Brandon

Current version of YOLOv4 is for Real-time on GPU. Later we will release YOLOv4-VPU for real-time >= 30 FPS on VPU.

modern_gpus

There are two ways to run YOLOv4 on MyriadX:

Support for YOLOv4 in OpenVINO - Wait until it is added to OpenVINO
Support for YOLOv4 in OpenCV-dnn (with OpenVINO IE-backend ) - wait for solving this issue: https://github.com/opencv/opencv/issues/17148

Right now, you can try to use a slightly simpler version of YOLOv4, which is 0.5% worse on VPU Intel MyriadX by using C++ with OpenVINO:

or (width=512 height=512 in cfg with accuracy 42.4% AP and speed 1.2 FPS) look at the table https://github.com/AlexeyAB/darknet/issues/5079#issue-585403577
or (width=320 height=320 in cfg 40.5% AP and 3 FPS)
or (width=320 height=320 in cfg 40.5% AP and ~7 FPS with async=3 streams)

use

cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
example: https://github.com/opencv/open_model_zoo/tree/master/demos/object_detection_demo_yolov3_async
1. just change anchors https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L118-L119 to these values: https://github.com/AlexeyAB/darknet/blob/36c73c5b9e3f2e72049fb68566e32632f6c70e85/cfg/yolov4.cfg#L1141
2. instead of this code: https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L196-L197 use this code

// actually should be 1.05, 1.1 and 1.2 for correspond [yolo] layers istead of 1.1
            double x = (col + output_blob[box_index + 0 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_w;
            double y = (row + output_blob[box_index + 1 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_h;

Apr 28 '20 17:04 AlexeyAB

@Luxonis-Brandon

I just tested csdarknet53-opt (YOLOv4 without MISH in cfg set: width=256 height=256 - 33.3% AP | 53.0% AP50) on your DepthAI (Myriad X) device with network resolution 256x256 and async=3 by using OpenCV (OpenVINO IE-backend) and get 11 FPS.

cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
source:
- C++: https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
- Python: https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py

Apr 28 '20 21:04 AlexeyAB

OpenCV_Vs_TensorRT