DeepLearnToolbox
DeepLearnToolbox copied to clipboard
output weight initialization in CNN
in cnnsetup, I don't always obtain convergence using the random initialization given in the code net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum + fvnum));
In some cases I'm better off setting all of these to 0 (net.ffW= rand(onum, fvnum)*0;
or something in-between, like net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum * fvnum));
Ok. I think I got that formula from a paper by Yann Lecun. IMO, it's just one of the things you need to vary when playing with these nets. If you can prove that your initialization is superior (theoretically or by experiments), I'll happily change the formula
I think initializing weights to 0 is theoretically wrong (Symmetry breaking). From the reference http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm
"If all the parameters start off at identical values, then all the hidden layer units will end up learning the same function of the input (more formally, W^{(1)}_{ij} will be the same for all values of i, so that a^{(2)}_1 = a^{(2)}_2 = a^{(2)}_3 = \ldots for any input x). The random initialization serves the purpose of symmetry breaking. "