wide-residual-networks Wrong conclusions

Wrong conclusions

Open ibmua opened this issue 9 years ago • 1 comments

trafficstars

Just took another look at https://arxiv.org/pdf/1605.07146v1.pdf

To summarize: • widening consistently improves performance across residual networks of different depth; • increasing both depth and width helps until the number of parameters becomes too high and stronger regularization is needed;

Actually, the first conclusion contradicts the second in a way, but that's not the point.

You may really want to try running tests on MNIST with absolute 0 amount of augmentation as a test on which it's actually easy to overfit. Lesson learnt from MNIST for me was that it's more like there's an optimal width & height which is actually rather low (for such a task). And also that the standart blocks/activation scheduling (standard "preact") may not always be optimal and that groups (like in https://arxiv.org/pdf/1605.06489v1.pdf ) are hugely beneficial at least up to some amount of them.

I was able to achieve .25% peak error rate pretty easily and my best arch was pulling out same peak, but also .26% error through lots of epochs, which was rather hard to get here considering that this is a very high precision already, so the across-epoch fluctuations are relatively high. This was without any parameter smoothing, like moving average.

Nov 07 '16 14:11 ibmua

Cifar performs pretty awfully with that arch, though.

Nov 16 '16 20:11 ibmua

wide-residual-networks wide-residual-networks copied to clipboard

Wrong conclusions

wide-residual-networks
wide-residual-networks copied to clipboard