baselines PPO2 clip value loss

PPO2 clip value loss

Open yuhsh24 opened this issue 6 years ago • 5 comments

If Clip the value to reduce variability during Critic training, why using tf.maximum(tf.square(vpred - R), tf.square(vpredclipped - R)) instead of using tf.minimum. Please tell me the reasons.

Dec 17 '18 08:12 yuhsh24

I have the same question. Also, the clip range tf.clip(vpred-OLDVPRED, -CLIPRANGE, CLIPRANGE) is weird. Probably it should be tf.clip(vpred-OLDVPRED, -tf.abs(OLDVPRED)*CLIPRANGE, tf.abs(OLDVPRED)*CLIPRANGE)??

Dec 18 '18 07:12 NoelFeiyang

Does anyone have further insight on either/both of the questions posed?

Jan 15 '19 16:01 rallen10

Issue #445 has some discussion on this

Mar 10 '19 11:03 brett-daley

I have the same question too. Searched for an answer but no luck. Please tell me the reason why it uses maximum instead of minimum.

Aug 27 '20 07:08 shtse8

Ok. I got the answer now. Because baseline calculate pgloss1 and pgloss2 with negative sign. So that it uses maximum afterward.

Aug 27 '20 07:08 shtse8

baselines baselines copied to clipboard

PPO2 clip value loss

baselines
baselines copied to clipboard