EfficientDet.Pytorch icon indicating copy to clipboard operation
EfficientDet.Pytorch copied to clipboard

Convolutions don't match paper

Open rmcavoy opened this issue 4 years ago • 13 comments

According to the original paper (see quotes below), the convolutions in this implementation appear to be incorrect since among other reasons normal convolutions are used. With regard to the bifpn,

Notably, to further improve the efficiency, we use depthwise separable convolution [4, 29] for feature fusion (in the bifpn), and add batch normalization and activation after each convolution.

Thus we know that in the BIFPN EfficientDet uses depthwise separable convolutions and has batch norm and activation functions following each depthwise separable convolution rather than the normal convolutions used here.

Batch normalization is added after every convolution with batch norm decay 0.997 and epsilon 1e-4.

This directly states the all convolutions have batch norms but without stating whether depthwise separable convolutions are used outside the bifpn. It seems likely to me that all convolutions are depthwise separable because otherwise whichever convolution was not depthwise separable would be the bottleneck.

rmcavoy avatar Mar 05 '20 12:03 rmcavoy

I went through and confirmed that if depthwise separable convolutions are used everywhere along with fixing the code so it is properly setting Dbifpn, Dclass/box, Wbifpn, Wclass/box etc to the correct values (see the other issues) that the total size of the network including both the backbone and the detector-heads matches the parameters numbers from the paper to a close approximation (you need to neglect the parameters from the final classifier in the backbone since they aren't used in EfficientDet at all). With this update, I am only short between .01 and 0.5 million parameters for each of the models from d0 to d7 instead of massively over for each if using normal convolutions.

rmcavoy avatar Mar 06 '20 12:03 rmcavoy

Edit: Turns out Figure 3 is just misleading.

rmcavoy avatar Mar 06 '20 13:03 rmcavoy

@rmcavoy thank you for your notes. Am I right that you claim that all the convolutions layers in BiPFN and the Box prediction network should be depthwise separable?

sevakon avatar Mar 09 '20 14:03 sevakon

@sevakon Yes, from my tests of the number of parameters, every part of the detection network uses depthwise separable convolutions instead of normal convolutions. This makes sense with the original paper using "conv" to denote the depthwise separable convolutions in the BiFPN (as explicitly stated in the paper) and then using "conv" again in the Classification/Regression networks without reclarifying.

rmcavoy avatar Mar 09 '20 16:03 rmcavoy

Also note that you need to make sure there is a batch norm after every depthwise separable layer to match the original paper.

rmcavoy avatar Mar 09 '20 16:03 rmcavoy

Just to be totally clear "every part of the detection network" means all of the BiFPN and RetinaHead?

turner-rovco avatar Mar 09 '20 16:03 turner-rovco

Yes, only the BiFPN and the RetinaHead. The backbone EfficientNet remains unchanged versus what is there currently.

rmcavoy avatar Mar 09 '20 16:03 rmcavoy

@toandaominh1997 Are you going to respond to any of these issues ever?

bluesky314 avatar Mar 24 '20 08:03 bluesky314

Also, Figure 3 shows each layer of the box/classification network having two (depth-wise separable) convolutions per layer as they surround the two convolutions with a dotted box just as they did the single layer of the bifpn.

@rmcavoy, I have assumed just like you that each layer of the box/classification network has two convs, but according to the original implementation that's not the case.

dkoguciuk avatar Mar 25 '20 16:03 dkoguciuk

@dkoguciuk Yep I discovered that as well but forgot to update this post.

rmcavoy avatar Mar 25 '20 17:03 rmcavoy

@rmcavoy, don't worry - you've help a lot of people here with your comments :)

dkoguciuk avatar Mar 26 '20 10:03 dkoguciuk

Yes, only the BiFPN and the RetinaHead. The backbone EfficientNet remains unchanged versus what is there currently.

you are right, this repo changed the conv stride, so the pretrained weights is gone, and performance will be bad. I re-implement efficientdet step-by-step from the official. from d0-d5, the result seems nice.

https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch

zylo117 avatar Apr 07 '20 15:04 zylo117

@zylo117 Just checked out your repo, it looks promising, thank you! The most important thing is for the repo owner to keep going, working on it, fixing issues etc, I hope you'll stick it out!

turner-rovco avatar Apr 08 '20 08:04 turner-rovco