octconv.pytorch icon indicating copy to clipboard operation
octconv.pytorch copied to clipboard

BatchNorm before activation vs BatchNorm after activation

Open AlexanderHustinx opened this issue 5 years ago • 0 comments

Thanks for your implementation of the Octave Conv paper.

I have a remark/question about the Conv_BN_ACT module. As BN after ACT sometimes makes more sense, following the PyTorch example I made a small OctaveCNN (each using 6 convs total) for the CIFAR10 dataset. Using PReLU activations, CrossEntropy loss, AmsGrad optimizer and alpha=0.25. After some experimentation with using BatchNorm before or after the activation I found the following results:

Network description # epochs accuracy (%) training loss test loss
Conv_BN_ACT 15 78.46 0.7093 0.6362
30 82.20 0.4613 0.5456
Conv_ACT_BN 15 82.84 0.3917 0.5260
30 84.18 0.1614 0.6036

I observe that Conv_ACT_BN has a tendency to overfit more as its training loss is noticeably lower than testing loss when compared to those of Conv_BN_ACT. However, Conv_ACT_BN does have a much higher accuracy.

Have you looked at this before? Is this the reason why you choose to include Conv_BN_ACT and not Conv_ACT_BN?

AlexanderHustinx avatar Sep 11 '20 17:09 AlexanderHustinx