baselines
baselines copied to clipboard
DDPG fail to run in TF2 branch
if self.param_noise:
logger.info('setting up param noise')
for var, perturbed_var in zip(self.actor.variables, self.perturbed_actor.variables):
if var in actor.perturbable_vars:
logger.info(' {} <- {} + noise'.format(perturbed_var.name, var.name))
else:
logger.info(' {} <- {}'.format(perturbed_var.name, var.name))
for var, perturbed_var in zip(self.actor.variables, self.perturbed_adaptive_actor.variables):
if var in actor.perturbable_vars:
logger.info(' {} <- {} + noise'.format(perturbed_var.name, var.name))
else:
logger.info(' {} <- {}'.format(perturbed_var.name, var.name))
When I was trying to use the DDPG in TF2 branch for MountainCarContinuous-v0, the code above throw a exception as follows:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I don't know how to solve it, the problem accured at the line of code if var in actor.perturbable_vars:
@oximi123 try
for p_var in actor.perturbable_vars:
if var.shape == p_var.shape and tf.reduce_all(var == p_var):
logger.info(' {} <- {} + noise'.format(perturbed_var.name, var.name))
else:
logger.info(' {} <- {}'.format(perturbed_var.name, var.name))
instead of
if var in actor.perturbable_vars:
logger.info(' {} <- {} + noise'.format(perturbed_var.name, var.name))
else:
logger.info(' {} <- {}'.format(perturbed_var.name, var.name))
You would also need to update every occurrence of if var in actor.perturbable_vars.
I guess the reason is the either equality operator over tf.Variable or that they behave like nested lists.
@gergely-soti I had the same issue, and I changed those lines with that piece of code. But the problem is that this "if in" syntax of these lines appears also in temporary files like tmpdun3zvk6.py apparently created on the fly on the temp directory and which are not possible to edit that way. I suspect that it might be a problem of compatibility of tensorflow version, or use of unsupported or deprecated code.
I tried it with some environments and all fail at the same point for the DDPG algorithm, with tensorflow 2.3.1 and python 3.6.