Wei-Cheng Lee comments

Results 4 comments of


                                            Wei-Cheng Lee

Can't understand reward scaling in value clipping of PPO

https://github.com/DLR-RM/stable-baselines3/blob/5a70af8abddfc96eac3911e69b20f19f5220947c/stable_baselines3/ppo/ppo.py#L230-L235 Sorry, I used the wrong link. I don't understand this sentence, "# NOTE: this depends on the reward scaling". The value I refer to is the variable which names...

Can't understand reward scaling in value clipping of PPO

I think the value is the output of the value network when giving the current observation. What does it relate to my question? Sorry, I may need more help. >:

Can't understand reward scaling in value clipping of PPO

I think I understand what did you mean now. Based on this post before, https://github.com/hill-a/stable-baselines/issues/216. I think what you mean to say is that value function is the cumulated future...

Can't understand reward scaling in value clipping of PPO

Thank you. However, I read the reference but still can't understand. I need some further explanations. Here is what I understand now. We need to make advantage have nice scale....