TensorLayer icon indicating copy to clipboard operation
TensorLayer copied to clipboard

Bug of PPO

Open GIS-PuppetMaster opened this issue 5 years ago • 4 comments

ratio = tf.exp(pi.log_prob(action) - old_pi.log_prob(action)) surr = ratio * adv ... loss = -tf.reduce_mean( tf.minimum(surr, tf.clip_by_value(ratio, 1. - self.epsilon, 1. + self.epsilon) * adv) )

should use ratio in tf.minimum rather than surr, because surr=ration*adv, and there could be negative value in adv, so the result of tf.minimum may contain a value like -1e10, and cause actor's loss failed.

GIS-PuppetMaster avatar Mar 13 '20 11:03 GIS-PuppetMaster

it should be like this: self.cliped_ratio = tf.clip_by_value(self.ratio, 1. - METHOD['epsilon'], 1. + METHOD['epsilon']) self.min_temp = tf.minimum(self.ratio, self.cliped_ratio) self.aloss = -tf.reduce_mean(self.min_temp * self.tfadv)

GIS-PuppetMaster avatar Mar 13 '20 11:03 GIS-PuppetMaster

Why the negative value causes failure in actor loss? You can also refer to OpenAI baselines here, which has similar process as our repo.

quantumiracle avatar Mar 13 '20 15:03 quantumiracle

Why the negative value causes failure in actor loss? You can also refer to OpenAI baselines here, which has similar process as our repo.

I drawed the loss polt and reward plot, when there is a very small negative value, such as 1e-10, the loss will be extremly larger than normally, and the reward stoped increase. I just tried lower learning rate, and there was no such 1e-10 value came out. I wonder if it's the same that use my code above, since it's more robust.

GIS-PuppetMaster avatar Mar 13 '20 16:03 GIS-PuppetMaster

Sorry for the late reply. What you mentioned might be caused by some numerical issues in tf.minimum if I understood correctly. Could you please print out an example case and paste it here? I'm a bit confused by your description since you mentioned both large negative value (-1e10) and small positive value (1e-10). A case showing how it causes a large loss value would be great.

quantumiracle avatar Jan 06 '21 02:01 quantumiracle