blueoil icon indicating copy to clipboard operation
blueoil copied to clipboard

[WIP] a new network GlazedYolo

Open tkng opened this issue 5 years ago • 4 comments

This network is experimental, so far it could not run on FPGAs.

Description

This network brings several recent ideas to our YOLOv2 implementation. In short, GlazedYolo = YoloV2 + BlazeFace + MixConv + Group Convolution.

Architecture difference from LMFYolo

  • mainly using 5x5 convolution instead of 3x3 convolution
  • sometimes stride=2 convolution is used
  • using residual connections
  • group convolution is heavily used
  • downsampling rate is 16 (LMFYolo's down sampling rate is 32)

I guess the last difference is the most effective one from the point of view of performance. GlazedYolo achieves following performance number (the number is mAP@IoU=0.5).

WIDER_FACE (160x160) PASCALVOC (320x320) GOPs@160x160
LMFYolo (quantized) 0.559 0.446 0.582
GlazedYolo (quantized) 0.727 0.472 0.697

On PASCALVOC, the difference is not much, but there's huge difference on WIDER_FACE. When input image size is enlarged to 320x320, GlazedYolo (quantized) achieves 81.9% mAP on WIDER_FACE dataset.

Further direction

To make it easier to run it on our accelerator, I'm planning following experiments

  • replace stride=2 conv with max pooling or space_to_depth
  • remove 5x5 conv

Motivation and Context

We want to have better network for object detection, witout changing computation cost drastically.

How has this been tested?

Check accuracy by executing through some experiments.

Screenshots (if appropriate):

None

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature / Optimization (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • [ ] My change requires a change to the documentation.
  • [ ] I have updated the documentation accordingly.

tkng avatar Sep 18 '19 05:09 tkng

To make it easier to run it on our accelerator, I'm planning following experiments

  • replace stride=2 conv with max pooling or space_to_depth
  • remove 5x5 conv

If we are talking about the new one stride=2 & 5x5 sounds ok to me, if the amount of compute doesn't change... :thinking:

n-nez avatar Sep 25 '19 01:09 n-nez

To support stride=2, 5x5conv on cpu: very easy: 5x5conv easy: stride=2 for AArch32 hard (or no optimization, just ignore unused results): stride=2 for other architectures

primenumber avatar Sep 25 '19 02:09 primenumber

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 12 '20 06:06 CLAassistant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jun 12 '20 06:06 CLAassistant