character-motion-vaes icon indicating copy to clipboard operation
character-motion-vaes copied to clipboard

Question about "dist_entropy" when updating ppo

Open quintus0505 opened this issue 2 years ago • 2 comments

Hi, I am reading your codes and have problem in evaluate_actions when updating ppo:

  • https://github.com/electronicarts/character-motion-vaes/blob/main/algorithms/ppo.py#L95

I notice that you get dist_entropy along with action and value loss, which function in backward propagation. Though dist_entropy doesn't work in your code since the entropy_coef currently in your code is 0 as default, I am still curious about how it functions and why you use this (What exactly "An ugly hack for my KFAC implementation." is :stuck_out_tongue_closed_eyes:)

Thanks

quintus0505 avatar Sep 10 '21 07:09 quintus0505

The dist_entropy term is described in the original PPO paper (https://arxiv.org/pdf/1707.06347.pdf); see Equation 9. The trick has been used in earlier papers as well. The purpose is to encourage exploration.

We set entropy_coef to 0 because it's already enough to solve the task. But if the policy gets stuck in a local minimum, increasing this term might help to find a better solution.

belinghy avatar Sep 21 '21 19:09 belinghy

Thanks!

quintus0505 avatar Sep 28 '21 04:09 quintus0505