snip
snip copied to clipboard
Batch layer comes after Relu in your vgg implementation fully connected layers. That is not common practice in deep nets.
You're right and they do come before ReLUs in the conv layers. There are normally no batchnorms in the classifier but since the paper says they replaced dropout with BN I must have find-replaced all dropout layers with batchnorm.
I came across this Reddit thread and now I'm not sure where BatchNorm belongs anymore.
https://www.youtube.com/watch?v=tNIpEZLv_eg&t=328s Watch this video from Andrew Ng. He mentions in third minute he talks about that and mentions that normalizing the values before applying activation function is much more common.