baselines icon indicating copy to clipboard operation
baselines copied to clipboard

Why the loss function formula is different in PPO code and paper?

Open bas1003 opened this issue 5 years ago • 1 comments

Hi,

In code, calculate the total loss like this, loss = pg_loss - entropy * ent_coef + vf_loss * vf_coef While in paper PPO, image (Eqn 10, https://arxiv.org/abs/1707.06347)

Why the loss function formula is different in PPO code and paper? Can someone explain why like that? Thank you.

bas1003 avatar Jun 21 '20 15:06 bas1003

Because in the paper, they are maximize the loss, while in code we usually minimize loss.

BingyuZhou avatar Dec 14 '20 21:12 BingyuZhou