Ryan Xiao
Results
2
comments of
Ryan Xiao
Hi, @lhk There is one more thing between PPO1 and PPO2 that I don't understand. Maybe I was wrong but in PPO1, the model actually maintains both the old and...
Well, I think that depends on the value of R right? Say we have OLDPRED=0.8, vpred=1, CLIPRANGE=0.1, so vpredclipped would be 0.9. if R is more closed to 1, then...