DQN_DDQN_Dueling_and_DDPG_Tensorflow
DQN_DDQN_Dueling_and_DDPG_Tensorflow copied to clipboard
Will removing batch normalization significantly hurt performance?
Hi, spiglerg! Thank you for replying to me so soon. However, I wonder whether removing batch normalization will significantly hurt performance for I want to test the code in "Reacher" task. I have seen that your code works well in the "Reacher" task as it is the sole algorithm solving the task in the openai gym currently. I wonder whether change the env_name to "Reacher-v1" can achieve comparable performance (solve the " Reacher" task after 25149 episodes) for I want to implement prioritized experience replay mechanism in the "Reacher" task. Plus, the variate 'unoform' can be removed in network.py. Thanks a lot! Regards, Cardwing
I didn't remember it was the only algorithm to solve it. :P I am pretty sure that TRPO would do better anyway.
In any case, I did not use batch normalization in any of the submitted solutions, so it should still solve it. You might have to play with the parameters a bit though.
For batch normalization you can try to use the following but I made some untested modifications so make sure it works first:
def batch_norm(x, beta, gamma, is_training, is_convnet=False):
"""
Batch normalization utility.
Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
Args:
x: Tensor, 2D [BD] or 4D [BHWD] input maps
is_training: boolean tf.Varialbe, true indicates training phase
Return:
normed: batch-normalized maps
"""
with tf.variable_scope('batch_norm'):
moments_dimensions = [0]
if is_convnet:
moments_dimensions = [0,1,2]
with tf.device('/cpu:0'):
batch_mean, batch_var = tf.nn.moments(x, moments_dimensions, name='moments')
ema = tf.train.ExponentialMovingAverage(decay=0.5)
def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(is_training,
mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
return normed
I see that the reward is ranging around -11 in the "Reacher" task after 20000 frames. Is it the code that achieves good performance in the "Reacher" task?
Hmm my MuJoCo license has expired so I can't test it now apparently. xD Did you try changing the discount factor or the other parameters? Also note that the 25.000 is the number of episodes, not frames. Looking at my Gym submission, I can see that it took 500-800k frames to converge. :)
Thank you for your help. I will try again.
Awesome. :)