Proximal-Policy-Optimization-Pytorch icon indicating copy to clipboard operation
Proximal-Policy-Optimization-Pytorch copied to clipboard

PPO value function clip

Open Asuka20 opened this issue 4 years ago • 0 comments

Hi, why do you use maximum instead of minimum to clipping value function loss? Suppose clippinng occurs, when v_pred_old < v_clipped < v_pred < R, or reversely, the loss will be larger than not clipped. Then why would it works to reduce the variability?

Asuka20 avatar Aug 30 '20 16:08 Asuka20