Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard

Published 20 hours ago •

Reame
Issues

about the advantage values in PPO2

Open Hardlygo opened this issue 3 years ago • 0 comments

I think that the advantage value here should be base on the old actor target_v = reward + args.gamma * self.critic_net(next_state)

Jul 13 '21 08:07 Hardlygo