blueoil
blueoil copied to clipboard
[WIP] a new network GlazedYolo
This network is experimental, so far it could not run on FPGAs.
Description
This network brings several recent ideas to our YOLOv2 implementation. In short, GlazedYolo = YoloV2 + BlazeFace + MixConv + Group Convolution.
Architecture difference from LMFYolo
- mainly using 5x5 convolution instead of 3x3 convolution
- sometimes stride=2 convolution is used
- using residual connections
- group convolution is heavily used
- downsampling rate is 16 (LMFYolo's down sampling rate is 32)
I guess the last difference is the most effective one from the point of view of performance. GlazedYolo achieves following performance number (the number is mAP@IoU=0.5).
WIDER_FACE (160x160) | PASCALVOC (320x320) | GOPs@160x160 | |
---|---|---|---|
LMFYolo (quantized) | 0.559 | 0.446 | 0.582 |
GlazedYolo (quantized) | 0.727 | 0.472 | 0.697 |
On PASCALVOC, the difference is not much, but there's huge difference on WIDER_FACE. When input image size is enlarged to 320x320, GlazedYolo (quantized) achieves 81.9% mAP on WIDER_FACE dataset.
Further direction
To make it easier to run it on our accelerator, I'm planning following experiments
- replace stride=2 conv with max pooling or space_to_depth
- remove 5x5 conv
Motivation and Context
We want to have better network for object detection, witout changing computation cost drastically.
How has this been tested?
Check accuracy by executing through some experiments.
Screenshots (if appropriate):
None
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature / Optimization (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
To make it easier to run it on our accelerator, I'm planning following experiments
- replace stride=2 conv with max pooling or space_to_depth
- remove 5x5 conv
If we are talking about the new one stride=2 & 5x5 sounds ok to me, if the amount of compute doesn't change... :thinking:
To support stride=2, 5x5conv on cpu: very easy: 5x5conv easy: stride=2 for AArch32 hard (or no optimization, just ignore unused results): stride=2 for other architectures
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.