All-Conv-Keras
All-Conv-Keras copied to clipboard
Strided convolutions missing activation?
The MaxPooling2D layers perform similar nonlinearity as ReLu so there is no need for activation function.
By replacing them with strided Conv2D we lose the nonlinearity effect and should add ReLu activation or the layer is basically useless (could be merged with next layer, because linear system).
Also the paper indicates that all Conv2D layers have ReLu activation.
@marcj maybe this gives your missing performance #4
I did that and only got 1% improvement.
See here the experiment with relu activation for all strided Conv layer: https://trainer.aetros.com/model/marcj/keras:all-conv/job/f1b5348da3f2d7879ebe4390a60d3f5f43d517ac
Script diff: https://trainer.aetros.com/api/file-compare/marcj/keras:all-conv/92fcd671c6814c375edd404a65edc66c00ba5aec...marcj/keras:all-conv/f1b5348da3f2d7879ebe4390a60d3f5f43d517ac:allconv.py
Here you can see the side-by-side comparison: https://trainer.aetros.com/compare/marcj/keras:all-conv/92fcd671c6814c375edd404a65edc66c00ba5aec,marcj/keras:all-conv/f1b5348da3f2d7879ebe4390a60d3f5f43d517ac
Left without relu (old) and right with relu (new):

Old accuracy:
training: 95.11687479987192
validation: 90.11
New accuracy:
training: 96.86399295549151
validation: 91.06
Yes, a ReLu activation has to be added.
And to achieve the accuracy mentioned in the paper, I contacted the authors regarding the same. But if you will see the paper they have mentioned that they achieved the results using very hard coded and extensive data augmentation and those methods are not mentioned in the paper. I am hoping to get a reply and add those techniques here.
Not so cool. When we don't know with what data the net has been trained, we can not compare the network with other ones. Thanks for contacting him/her, we will see what they'll say.
interesting it is good idea to have relu