pixel-cnn icon indicating copy to clipboard operation
pixel-cnn copied to clipboard

The output of conv2d should be updated after g and b are updated in data dependent initialization.

Open bfs18 opened this issue 6 years ago • 2 comments

The initial values for g and b are used to keep the pre-activation values normally-distributed. After the tf.assign operation for g and b, the output of the current conv2d layer is changed, so the input to the next layer is changed. I think the initialization of g and b for the next layer should depends on the new conv2d output. So I think the customized conv2d in nn.py should be modified as the following

def conv2d(x_, num_filters, filter_size=[3, 3], stride=[1, 1], pad='SAME', nonlinearity=None, init_scale=1., counters={},
           init=False, ema=None, **kwargs):
    ''' convolutional layer '''
    name = get_name('conv2d', counters)
    with tf.variable_scope(name):
        V = get_var_maybe_avg('V', ema, shape=filter_size + [int(x.get_shape()[-1]), num_filters], dtype=tf.float32,
                              initializer=tf.random_normal_initializer(0, 0.05), trainable=True)
        g = get_var_maybe_avg('g', ema, shape=[num_filters], dtype=tf.float32,
                              initializer=tf.constant_initializer(1.), trainable=True)
        b = get_var_maybe_avg('b', ema, shape=[num_filters], dtype=tf.float32,
                              initializer=tf.constant_initializer(0.), trainable=True)

        # use weight normalization (Salimans & Kingma, 2016)
        W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])

        # calculate convolutional layer output
        x = tf.nn.bias_add(tf.nn.conv2d(x_, W, [1] + stride + [1], pad), b)

        if init:  # normalize x
            m_init, v_init = tf.nn.moments(x, [0, 1, 2])
            scale_init = init_scale / tf.sqrt(v_init + 1e-10)
            with tf.control_dependencies([g.assign(g * scale_init), b.assign_add(-m_init * scale_init)]):
                # x = tf.identity(x)
                W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])
                x = tf.nn.bias_add(tf.nn.conv2d(x_, W, [1] + stride + [1], pad), b)

        # apply nonlinearity
        if nonlinearity is not None:
            x = nonlinearity(x)

        return x

bfs18 avatar Jun 18 '18 17:06 bfs18

I have a simple question, what is the idea of having different weight normalization "flows" when init=True vs Init= False?

harsh306 avatar Jun 18 '18 17:06 harsh306

@harsh306 This is the data dependent initialization for g and b. You can find the details in the Weight Normalization paper.

bfs18 avatar Jun 19 '18 02:06 bfs18