DeepRL-TensorFlow2
DeepRL-TensorFlow2 copied to clipboard
Reward modification in PPO
https://github.com/marload/DeepRL-TensorFlow2/blob/876266d9a5fcf7d8a7c7e3afd8b110085b32b615/PPO/PPO_Discrete.py#L151-L154
https://github.com/marload/DeepRL-TensorFlow2/blob/876266d9a5fcf7d8a7c7e3afd8b110085b32b615/PPO/PPO_Continuous.py#L167-L170
In PPO_Discrete each reward is multiplied by 0.01 and in PPO_Continuous reward is also modified. I don't understand why do these modification, what does these modification do?
same question
乘0.01应该是减小奖励,使其保持在0-1之间(我猜测)