chainer-fast-neuralstyle icon indicating copy to clipboard operation
chainer-fast-neuralstyle copied to clipboard

Why FastStyleNet add w = math.sqrt(2) in ResidualBlock

Open jackieyung opened this issue 9 years ago • 2 comments

Thanks for reading my problem.

When I check the FastStyleNet,I found that the Convolution2D add the w=math.sqrt(2).the code is below:

class ResidualBlock(chainer.Chain): def init(self, n_in, n_out, stride=1, ksize=3): w = math.sqrt(2) super(ResidualBlock, self).init( c1=L.Convolution2D(n_in, n_out, ksize, stride, 1, w), c2=L.Convolution2D(n_out, n_out, ksize, 1, 1, w), b1=L.BatchNormalization(n_out), b2=L.BatchNormalization(n_out) )

I have checked the Convolution2D's source code , the parameter w means a scale.

The problem is that I don't know why it is setted to sqrt(2).Could it be 1?

Thanks very much.

jackieyung avatar Sep 30 '16 11:09 jackieyung

http://docs.chainer.org/en/stable/_modules/chainer/links/connection/convolution_2d.html#Convolution2D

wscale is only used for the initializer.

So this w is the scale used for initializing the weights with gaussian noise. So w is used only during initialization and during training and execution of the model it becomes irrelevant. My guess would be that the actually value is more or less empirically chosen as a trade-off between initial noisiness and training time.

If you're willing to wait longer, you could try setting it even lower, so that the NN starts out with a lower response (= more gray) but also with less noise, but then it might take longer for the NN to learn to produce full-amplitude outputs.

fxtentacle avatar Oct 16 '16 03:10 fxtentacle

@fxtentacle Thanks a lot.I have understood it.

jackieyung avatar Oct 16 '16 07:10 jackieyung