pixel-cnn
pixel-cnn copied to clipboard
The output of conv2d should be updated after g and b are updated in data dependent initialization.
The initial values for g
and b
are used to keep the pre-activation values normally-distributed. After the tf.assign
operation for g
and b
, the output of the current conv2d layer is changed, so the input to the next layer is changed. I think the initialization of g
and b
for the next layer should depends on the new conv2d output.
So I think the customized conv2d in nn.py should be modified as the following
def conv2d(x_, num_filters, filter_size=[3, 3], stride=[1, 1], pad='SAME', nonlinearity=None, init_scale=1., counters={},
init=False, ema=None, **kwargs):
''' convolutional layer '''
name = get_name('conv2d', counters)
with tf.variable_scope(name):
V = get_var_maybe_avg('V', ema, shape=filter_size + [int(x.get_shape()[-1]), num_filters], dtype=tf.float32,
initializer=tf.random_normal_initializer(0, 0.05), trainable=True)
g = get_var_maybe_avg('g', ema, shape=[num_filters], dtype=tf.float32,
initializer=tf.constant_initializer(1.), trainable=True)
b = get_var_maybe_avg('b', ema, shape=[num_filters], dtype=tf.float32,
initializer=tf.constant_initializer(0.), trainable=True)
# use weight normalization (Salimans & Kingma, 2016)
W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])
# calculate convolutional layer output
x = tf.nn.bias_add(tf.nn.conv2d(x_, W, [1] + stride + [1], pad), b)
if init: # normalize x
m_init, v_init = tf.nn.moments(x, [0, 1, 2])
scale_init = init_scale / tf.sqrt(v_init + 1e-10)
with tf.control_dependencies([g.assign(g * scale_init), b.assign_add(-m_init * scale_init)]):
# x = tf.identity(x)
W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])
x = tf.nn.bias_add(tf.nn.conv2d(x_, W, [1] + stride + [1], pad), b)
# apply nonlinearity
if nonlinearity is not None:
x = nonlinearity(x)
return x
I have a simple question, what is the idea of having different weight normalization "flows" when init=True vs Init= False?
@harsh306 This is the data dependent initialization for g and b. You can find the details in the Weight Normalization paper.