DeepLearnToolbox icon indicating copy to clipboard operation
DeepLearnToolbox copied to clipboard

output weight initialization in CNN

Open dinosg opened this issue 10 years ago • 2 comments

in cnnsetup, I don't always obtain convergence using the random initialization given in the code net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum + fvnum));

In some cases I'm better off setting all of these to 0 (net.ffW= rand(onum, fvnum)*0;

or something in-between, like net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum * fvnum));

dinosg avatar Jun 20 '14 18:06 dinosg

Ok. I think I got that formula from a paper by Yann Lecun. IMO, it's just one of the things you need to vary when playing with these nets. If you can prove that your initialization is superior (theoretically or by experiments), I'll happily change the formula

rasmusbergpalm avatar Jun 26 '14 05:06 rasmusbergpalm

I think initializing weights to 0 is theoretically wrong (Symmetry breaking). From the reference http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm

"If all the parameters start off at identical values, then all the hidden layer units will end up learning the same function of the input (more formally, W^{(1)}_{ij} will be the same for all values of i, so that a^{(2)}_1 = a^{(2)}_2 = a^{(2)}_3 = \ldots for any input x). The random initialization serves the purpose of symmetry breaking. "

taygunk avatar Jun 26 '14 17:06 taygunk