DeepLearnToolbox output weight initialization in CNN

output weight initialization in CNN

Open dinosg opened this issue 10 years ago • 2 comments

in cnnsetup, I don't always obtain convergence using the random initialization given in the code net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum + fvnum));

In some cases I'm better off setting all of these to 0 (net.ffW= rand(onum, fvnum)*0;

or something in-between, like net.ffW = (rand(onum, fvnum) - 0.5) * 2 * sqrt(6 / (onum * fvnum));

Jun 20 '14 18:06 dinosg

Ok. I think I got that formula from a paper by Yann Lecun. IMO, it's just one of the things you need to vary when playing with these nets. If you can prove that your initialization is superior (theoretically or by experiments), I'll happily change the formula

Jun 26 '14 05:06 rasmusbergpalm

I think initializing weights to 0 is theoretically wrong (Symmetry breaking). From the reference http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm

"If all the parameters start off at identical values, then all the hidden layer units will end up learning the same function of the input (more formally, W^{(1)}_{ij} will be the same for all values of i, so that a^{(2)}_1 = a^{(2)}_2 = a^{(2)}_3 = \ldots for any input x). The random initialization serves the purpose of symmetry breaking. "

Jun 26 '14 17:06 taygunk

DeepLearnToolbox DeepLearnToolbox copied to clipboard

output weight initialization in CNN

DeepLearnToolbox
DeepLearnToolbox copied to clipboard