Ryan Xiao

Results 2 comments of Ryan Xiao

Hi, @lhk There is one more thing between PPO1 and PPO2 that I don't understand. Maybe I was wrong but in PPO1, the model actually maintains both the old and...

Well, I think that depends on the value of R right? Say we have OLDPRED=0.8, vpred=1, CLIPRANGE=0.1, so vpredclipped would be 0.9. if R is more closed to 1, then...