baselines How PPO2 value function max-clip reduce variability?

How PPO2 value function max-clip reduce variability?

Open Asuka20 opened this issue 4 years ago • 2 comments

Hi, why using maximum instead of minimum to clipping value function loss? Suppose clippinng occurs, when v_pred_old < v_clipped < v_pred < R, or reversely, the loss will be larger than not clipped. Then why would it works to reduce the variability?

Aug 30 '20 16:08 Asuka20

I have the same question and even don't understand why we need to do clipping on value loss.

Aug 31 '20 03:08 shtse8

I think you should consider a minimum for the clipper loss function

Sep 20 '20 17:09 viai957

baselines baselines copied to clipboard

How PPO2 value function max-clip reduce variability?

baselines
baselines copied to clipboard