Reward modification in PPO

Open Ynjxsjmh opened this issue 4 years ago • 2 comments

https://github.com/marload/DeepRL-TensorFlow2/blob/876266d9a5fcf7d8a7c7e3afd8b110085b32b615/PPO/PPO_Discrete.py#L151-L154

https://github.com/marload/DeepRL-TensorFlow2/blob/876266d9a5fcf7d8a7c7e3afd8b110085b32b615/PPO/PPO_Continuous.py#L167-L170

In PPO_Discrete each reward is multiplied by 0.01 and in PPO_Continuous reward is also modified. I don't understand why do these modification, what does these modification do?

Feb 21 '21 04:02 Ynjxsjmh

same question

Feb 25 '21 03:02 ghost

乘0.01应该是减小奖励，使其保持在0-1之间（我猜测）

Mar 08 '23 08:03 huojitiaotiao