octconv.pytorch BatchNorm before activation vs BatchNorm after activation

BatchNorm before activation vs BatchNorm after activation

Open AlexanderHustinx opened this issue 5 years ago • 0 comments

Thanks for your implementation of the Octave Conv paper.

I have a remark/question about the Conv_BN_ACT module. As BN after ACT sometimes makes more sense, following the PyTorch example I made a small OctaveCNN (each using 6 convs total) for the CIFAR10 dataset. Using PReLU activations, CrossEntropy loss, AmsGrad optimizer and alpha=0.25. After some experimentation with using BatchNorm before or after the activation I found the following results:

Network description	# epochs	accuracy (%)	training loss	test loss
`Conv_BN_ACT`	15	78.46	0.7093	0.6362
	30	82.20	0.4613	0.5456
`Conv_ACT_BN`	15	82.84	0.3917	0.5260
	30	84.18	0.1614	0.6036

I observe that Conv_ACT_BN has a tendency to overfit more as its training loss is noticeably lower than testing loss when compared to those of Conv_BN_ACT. However, Conv_ACT_BN does have a much higher accuracy.

Have you looked at this before? Is this the reason why you choose to include Conv_BN_ACT and not Conv_ACT_BN?

Sep 11 '20 17:09 AlexanderHustinx

octconv.pytorch octconv.pytorch copied to clipboard

BatchNorm before activation vs BatchNorm after activation

octconv.pytorch
octconv.pytorch copied to clipboard