Weight-Normalization icon indicating copy to clipboard operation
Weight-Normalization copied to clipboard

what's the role param init play?

Open MachineJeff opened this issue 6 years ago • 3 comments

Hi I wanna know what the param init do in your code 捕获

and why you create 3 models in the same train script I was confused

MachineJeff avatar Nov 25 '19 09:11 MachineJeff

Hello In the templates the weights and all the parameters are defined with "tf.get_variable(...)". In the model template these weights are defined. So the initialization, training, and testing phases share the same weights. Before training phase the parameters are initialized with the initialization call on the model template, so the parameters has been initialized with a feed forward step. After the parameters are being initialized, the training phase is called with the initialized parameters. Templates can handle the situtation when calling it multiple times. If the parameters were initialized before calling the model template once again, the second time the template will reuse the initialized variables.

zoli333 avatar Nov 25 '19 19:11 zoli333

Right, I get your point.

But, why not remove the param init and just create one model, like this:

init_forward = model(x_init,keep_prob=0.5,deterministic=is_training,
                        use_weight_normalization=use_weight_normalization,
                        use_batch_normalization=use_batch_normalization, 
                        use_mean_only_batch_normalization=use_mean_only_batch_normalization)

Use variable is_training to distinguish train and test, then just apply train, test in the same init_forward model?

MachineJeff avatar Nov 26 '19 02:11 MachineJeff

Besides, really appreciate your weight_norm code. I do not like the param init So I rewrite the weight_norm code into this:

def wn_conv1d(x, kernel_size, channels, scope, stride=1, pad='SAME', dilation=1, nonlinearity=None, init_scale=1.):

    xs = int_shape(x)
    filter_size = [1, kernel_size]
    dila = [1, dilation]
    strs = [1, stride]
    with tf.variable_scope(scope):
        # data based initialization of parameters
        V = tf.get_variable('V', filter_size+[xs[-1],channels], tf.float32, tf.random_normal_initializer(0, 0.05), trainable=True)
        V_norm = tf.nn.l2_normalize(V.initialized_value(), [0,1,2])
        x_init = tf.nn.conv2d(x, V_norm, [1]+strs+[1], pad, dilations=dila)
        m_init, v_init = tf.nn.moments(x_init, [0,1,2])
        scale_init = init_scale/tf.sqrt(v_init + 1e-8)
        x_init = tf.reshape(scale_init,[1,1,1,channels])*(x_init-tf.reshape(m_init,[1,1,1,channels]))
        if nonlinearity is not None:
            x_init = nonlinearity(x_init)
        return x_init

That's not conv2d, but conv1d (It does not matter)

Any problem in my code do you think?

MachineJeff avatar Nov 26 '19 02:11 MachineJeff