baselines
baselines copied to clipboard
policy entropy in PPO2
Hi, On applying PPO2 to a custom Mujoco environment, the policy entropy is continuously increasing even with a small entropy coefficient of 0.01 or even less. In my understanding, ideally the policy entropy is supposed to decrease over time, What could be the issue?? Also any suggestions on managing the entropy coefficient to encourage sufficient exploration.
Regards,
Hi Ashilsanand, Have you get any solution for this?