yolo2_light icon indicating copy to clipboard operation
yolo2_light copied to clipboard

fuse convolutional and batch_norm weights into one convolutional-layer

Open DIAMONDWHILE opened this issue 5 years ago • 4 comments

Excuse me Could you tell me where is the paper on fusion convolution and batch normalization?

DIAMONDWHILE avatar Sep 18 '18 09:09 DIAMONDWHILE

There is a simple explanation: http://machinethink.net/blog/object-detection-with-yolo/

And here’s the calculation performed by the batch normalization to the output of that convolution:

        gamma * (out[j] - mean)
bn[j] = ---------------------- + beta
            sqrt(variance)

It subtracts the mean from the output pixel, divides by the variance, multiplies by a scaling factor gamma, and adds the offset beta. These four parameters — mean, variance, gamma, and beta — are what the batch normalization layer learns as the network is trained.

To get rid of the batch normalization, we can shuffle these two equations around a bit to compute new weights and bias terms for the convolution layer:

           gamma * w
w_new = --------------
        sqrt(variance)

        gamma*(b - mean)
b_new = ---------------- + beta
         sqrt(variance)

Performing a convolution with these new weights and bias terms on input x will give the same result as the original convolution plus batch normalization.

Now we can remove this batch normalization layer and just use the convolutional layer, but with these adjusted weights and bias terms w_new and b_new. We repeat this procedure for all the convolutional layers in the network.

AlexeyAB avatar Sep 18 '18 11:09 AlexeyAB

Thank you for the explanation.

DIAMONDWHILE avatar Sep 18 '18 14:09 DIAMONDWHILE

@DIAMONDWHILE @AlexeyAB can you share the implemented source code of the following idea shared in this link

abhigoku10 avatar Jul 15 '19 09:07 abhigoku10

@DIAMONDWHILE @AlexeyAB can you share the implemented source code of the following idea shared in this link

Note: The convolution layers in YOLO don’t actually use bias, so b is zero in the above equation. But note that after folding the batch norm parameters, the convolution layers do get a bias term. https://github.com/AlexeyAB/yolo2_light/blob/ab8be63d1e94cc8980d48050b1a308dd3bada4a7/src/additionally.c#L66-L109

MambaWong avatar Aug 08 '19 02:08 MambaWong