baselines
baselines copied to clipboard
Why the loss function formula is different in PPO code and paper?
Hi,
In code, calculate the total loss like this,
loss = pg_loss - entropy * ent_coef + vf_loss * vf_coef
While in paper PPO,
(Eqn 10, https://arxiv.org/abs/1707.06347)
Why the loss function formula is different in PPO code and paper? Can someone explain why like that? Thank you.
Because in the paper, they are maximize the loss, while in code we usually minimize loss.