baselines
baselines copied to clipboard
How PPO2 value function max-clip reduce variability?
Hi, why using maximum instead of minimum to clipping value function loss? Suppose clippinng occurs, when v_pred_old < v_clipped < v_pred < R, or reversely, the loss will be larger than not clipped. Then why would it works to reduce the variability?
I have the same question and even don't understand why we need to do clipping on value loss.
I think you should consider a minimum for the clipper loss function