darknet icon indicating copy to clipboard operation
darknet copied to clipboard

Comparison of some models on CPU vs VPU (neurochip) vs GPU

Open AlexeyAB opened this issue 4 years ago • 28 comments

  • batch=1 (sync-mode)
  • CPU, VPU
    • OpenCV 4.2.0 (master-branch 21 Mar 2020)
    • OpenVINO 2020.1.033
  • GPU
    • CUDA 10.0
    • cuDNN 7.4.2
    • Darknet (Mar 22, 2020) GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1

Accuracy and FPS:

Model AP50...95 (MSCOCO), accuracy mAP50 (MSCOCO), accuracy CPU - 90 Watt - FP32 (Intel Core i7-6700K 4GHz 8 Logical Cores) OpenCV-DLIE, FPS VPU - 2 Watt - FP16 (Intel Myriad X) OpenCV-DLIE, FPS GPU - 175 Watt - FP32/16 (nVidia GeForce RTX 2070) Darknet-cuDNN, FPS
yolov4-tiny 416x416 40.2% - - 330
yolov3-tiny 416x416 33.1% 35 6.5 340
yolov3-tiny-PRN 416x416 33.1% 46 5.3 370
EfficientNetB0-Yolo 416x416 45.5% 11 - 55
yolov3 416x416 31.0% 55.3% - - -
yolov3-spp 512x512 ~59.6% 3.3 1.1 52
csresnext50-opt 512x512 42.4% 64.4% 3.5 0.64 37
csdarknet53-opt 256x256 async=3 33.3% 53.0% 14 11 74
csdarknet53-opt 512x512 42.4% 64.5% 3.5 1.23 50
csdarknet53-mish 512x512 (YOLOv4) 43.0% 64.9% - - 50
csresnext50-opt 608x608 43.2% 65.4% - - 34
csdarknet53-mish 608x608 (YOLOv4) 43.5% 65.7% - - 37

AlexeyAB avatar Mar 21 '20 01:03 AlexeyAB

@AlexeyAB Hello,

So currently EfficientNetB0-Yolo is the fastest model on VPU?

WongKinYiu avatar Mar 21 '20 01:03 WongKinYiu

@WongKinYiu Hi,

Yes, it seems VPU (Intel Myriad X) is highly optimized for Grouped-convolutional and may be SE-blocks. I will test it more.

Maybe with new Google-Coral-TPU-edge in general, the performance ratio will be the same as with Intel Myriad X.

So maybe it makes sense to train GhostNet ghostnet.cfg.txt and yolov3-tiny-3l-ghostnet (as a new tiny-yolo model): https://github.com/AlexeyAB/darknet/issues/4418#issue-530577441

AlexeyAB avatar Mar 21 '20 02:03 AlexeyAB

@AlexeyAB Thanks,

ghostnet now training 40k/800k iterations.

WongKinYiu avatar Mar 21 '20 02:03 WongKinYiu

@WongKinYiu Do you train ghostnet with CutMix+Mosaic+Label-smoothing?

Also did we get improvement for any network with DropBlock?

AlexeyAB avatar Mar 23 '20 18:03 AlexeyAB

This is a fantastic resource, if at all possible, it'd be great to also see results for "batch=4" or similar.

LukeAI avatar Mar 23 '20 19:03 LukeAI

@AlexeyAB No, just ghostnet.cfg.txt your provided before.

WongKinYiu avatar Mar 23 '20 22:03 WongKinYiu

@WongKinYiu I also added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg that you can try to train with subdivisions=6 or 4

AlexeyAB avatar Mar 23 '20 23:03 AlexeyAB

@AlexeyAB thanks, i am seeing the code of new commits.

WongKinYiu avatar Mar 23 '20 23:03 WongKinYiu

@AlexeyAB i set subdivisions=4 and the training is start now.

WongKinYiu avatar Mar 25 '20 04:03 WongKinYiu

Hi @AlexeyAB When you test CPU, VPU , do you use FP32? As far as I know, VPU can use FP16 and Int8. this information is very important.

ShaneHsieh avatar Mar 27 '20 05:03 ShaneHsieh

@ShaneHsieh I added this information, so CPU uses FP32, VPU uses FP16, GPU uses FP32/16 (Tensor Cores). These devices use the lowest possible precision of floating point values ​​with increasing speed and without loss of accuracy.

AlexeyAB avatar Mar 27 '20 13:03 AlexeyAB

Thank. Compare CPU and GPU when use FP32 , CPU use EfficientNetB0-Yolo can get better performance. it is good information.

ShaneHsieh avatar Mar 30 '20 03:03 ShaneHsieh

what does the opencv-DLIE mean?

andeyeluguo avatar Apr 26 '20 06:04 andeyeluguo

OpenCV-DLIE (deep learning Inference Engine), supported by OpenVINO Toolkit.

WongKinYiu avatar Apr 26 '20 06:04 WongKinYiu

Yes, you can use opencv dnn module to run the models. For example, yolov3, yolov3-tiny-prn, efficientnetb0-yolo...

But due to mish activation function and eliminate grid sensitivity not yet supported by opencv dnn module, you can not run yolov4 in this time.

WongKinYiu avatar Apr 26 '20 07:04 WongKinYiu

Does it support alexeyAB's version ?, I now only find the tensorflow's yolo version that OpenVINO support.

andeyeluguo avatar Apr 26 '20 07:04 andeyeluguo

for your reference https://github.com/opencv/opencv/pull/16436

WongKinYiu avatar Apr 26 '20 07:04 WongKinYiu

will you please give me a tutorial of how to deploy the cfg file to xml which OpenVINO supports? I see the question on the site Does OpenCV-OpenVINO version supports Yolo v3 network? It may be asked by alexeyAB.

andeyeluguo avatar Apr 26 '20 08:04 andeyeluguo

Darknet is supported already. https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV

WongKinYiu avatar Apr 26 '20 08:04 WongKinYiu

@andeyeluguo For using Yolo with OpenVINO (on CPU, GPU, VPU, ...) you should

  1. install OpenVINO as usual
  2. install OpenCV with OpenVINO-backend: https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend
  3. run yolov3.cfg + yolov3.weights by using OpenCV-dnn https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html examples how to use Yolo
    • https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
    • https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py

YOLOv4 will be supported for OpenCV+OpenVINO soon: https://github.com/opencv/opencv/issues/17148

I added Yolo v2 to OpenCV 2.5 years ago: https://github.com/opencv/opencv/pull/9705

AlexeyAB avatar Apr 26 '20 10:04 AlexeyAB

Can these models also be run on NCS 2 using the OpenCV DNN module with IE backend?

mmaaz60 avatar Apr 27 '20 21:04 mmaaz60

@mmaaz60 it seems like that is the case. We will be trying on DepthAI (Myriad X based) shortly and will circle back.

Also @AlexeyAB if you have any instructions on how to use YOLOv4 on VPU, we'd be keen to try them out on DepthAI.

Luxonis-Brandon avatar Apr 28 '20 16:04 Luxonis-Brandon

@Luxonis-Brandon

Current version of YOLOv4 is for Real-time on GPU. Later we will release YOLOv4-VPU for real-time >= 30 FPS on VPU.

modern_gpus


There are two ways to run YOLOv4 on MyriadX:

  1. Support for YOLOv4 in OpenVINO - Wait until it is added to OpenVINO
  2. Support for YOLOv4 in OpenCV-dnn (with OpenVINO IE-backend ) - wait for solving this issue: https://github.com/opencv/opencv/issues/17148

Right now, you can try to use a slightly simpler version of YOLOv4, which is 0.5% worse on VPU Intel MyriadX by using C++ with OpenVINO:

  • or (width=512 height=512 in cfg with accuracy 42.4% AP and speed 1.2 FPS) look at the table https://github.com/AlexeyAB/darknet/issues/5079#issue-585403577
  • or (width=320 height=320 in cfg 40.5% AP and 3 FPS)
  • or (width=320 height=320 in cfg 40.5% AP and ~7 FPS with async=3 streams)

use

  • cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
  • weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
  • example: https://github.com/opencv/open_model_zoo/tree/master/demos/object_detection_demo_yolov3_async
    1. just change anchors https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L118-L119 to these values: https://github.com/AlexeyAB/darknet/blob/36c73c5b9e3f2e72049fb68566e32632f6c70e85/cfg/yolov4.cfg#L1141
    2. instead of this code: https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L196-L197 use this code
// actually should be 1.05, 1.1 and 1.2 for correspond [yolo] layers istead of 1.1
            double x = (col + output_blob[box_index + 0 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_w;
            double y = (row + output_blob[box_index + 1 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_h;

AlexeyAB avatar Apr 28 '20 17:04 AlexeyAB

@Luxonis-Brandon

I just tested csdarknet53-opt (YOLOv4 without MISH in cfg set: width=256 height=256 - 33.3% AP | 53.0% AP50) on your DepthAI (Myriad X) device with network resolution 256x256 and async=3 by using OpenCV (OpenVINO IE-backend) and get 11 FPS.

  • cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
  • weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
  • source:
    • C++: https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
    • Python: https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py

AlexeyAB avatar Apr 28 '20 21:04 AlexeyAB

OpenCV_Vs_TensorRT

AlexeyAB avatar Jun 17 '20 19:06 AlexeyAB

OpenCV 4.4.0-pre compiled by self. OpenVino 2020.R3, Myriad. net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)

Input 416x416

efficient-b0 395 ms yolov3, 550 ms yolov3-tiny-prn, 168 ms yolov3-tiny, 128 ms yolov4, 940 ms efnet-coco, 395 ms

ausk avatar Jun 23 '20 08:06 ausk

YOLOv4-tiny released: https://github.com/AlexeyAB/darknet/issues/6067

AlexeyAB avatar Jun 25 '20 14:06 AlexeyAB

Hi, Who has efficientnet-lite3.weights file, can you share it with me?

linyib avatar Mar 12 '24 08:03 linyib