baselines Why the loss function formula is different in PPO code and paper?

Why the loss function formula is different in PPO code and paper?

Open bas1003 opened this issue 5 years ago • 1 comments

Hi,

In code, calculate the total loss like this, loss = pg_loss - entropy * ent_coef + vf_loss * vf_coef While in paper PPO, (Eqn 10, https://arxiv.org/abs/1707.06347)

Why the loss function formula is different in PPO code and paper? Can someone explain why like that? Thank you.

Jun 21 '20 15:06 bas1003

Because in the paper, they are maximize the loss, while in code we usually minimize loss.

Dec 14 '20 21:12 BingyuZhou

baselines baselines copied to clipboard

Why the loss function formula is different in PPO code and paper?

baselines
baselines copied to clipboard