DQN_DDQN_Dueling_and_DDPG_Tensorflow Will removing batch normalization significantly hurt performance?

Hi, spiglerg! Thank you for replying to me so soon. However, I wonder whether removing batch normalization will significantly hurt performance for I want to test the code in "Reacher" task. I have seen that your code works well in the "Reacher" task as it is the sole algorithm solving the task in the openai gym currently. I wonder whether change the env_name to "Reacher-v1" can achieve comparable performance (solve the " Reacher" task after 25149 episodes) for I want to implement prioritized experience replay mechanism in the "Reacher" task. Plus, the variate 'unoform' can be removed in network.py. Thanks a lot! Regards, Cardwing

Feb 14 '17 13:02 cardwing

I didn't remember it was the only algorithm to solve it. :P I am pretty sure that TRPO would do better anyway.

In any case, I did not use batch normalization in any of the submitted solutions, so it should still solve it. You might have to play with the parameters a bit though.

For batch normalization you can try to use the following but I made some untested modifications so make sure it works first:

def batch_norm(x, beta, gamma, is_training, is_convnet=False):
	"""
	Batch normalization utility.
	Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
	Args:
		x:		   Tensor, 2D [BD] or 4D [BHWD] input maps
		is_training: boolean tf.Varialbe, true indicates training phase
	Return:
		normed:	  batch-normalized maps
	"""

	with tf.variable_scope('batch_norm'):
		moments_dimensions = [0]
		if is_convnet:
			moments_dimensions = [0,1,2]

		with tf.device('/cpu:0'):
			batch_mean, batch_var = tf.nn.moments(x, moments_dimensions, name='moments')
		ema = tf.train.ExponentialMovingAverage(decay=0.5)

		def mean_var_with_update():
			ema_apply_op = ema.apply([batch_mean, batch_var])
			with tf.control_dependencies([ema_apply_op]):
				return tf.identity(batch_mean), tf.identity(batch_var)

		mean, var = tf.cond(is_training,
							mean_var_with_update,
							lambda: (ema.average(batch_mean), ema.average(batch_var)))

		normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)

	return normed

Feb 14 '17 13:02 spiglerg

I see that the reward is ranging around -11 in the "Reacher" task after 20000 frames. Is it the code that achieves good performance in the "Reacher" task?

Feb 14 '17 13:02 cardwing

Hmm my MuJoCo license has expired so I can't test it now apparently. xD Did you try changing the discount factor or the other parameters? Also note that the 25.000 is the number of episodes, not frames. Looking at my Gym submission, I can see that it took 500-800k frames to converge. :)

Feb 14 '17 13:02 spiglerg

Thank you for your help. I will try again.

Feb 15 '17 02:02 cardwing

Awesome. :)

Feb 15 '17 13:02 spiglerg

DQN_DDQN_Dueling_and_DDPG_Tensorflow DQN_DDQN_Dueling_and_DDPG_Tensorflow copied to clipboard

Will removing batch normalization significantly hurt performance?

DQN_DDQN_Dueling_and_DDPG_Tensorflow
DQN_DDQN_Dueling_and_DDPG_Tensorflow copied to clipboard