Deep-reinforcement-learning-with-pytorch icon indicating copy to clipboard operation
Deep-reinforcement-learning-with-pytorch copied to clipboard

about the advantage values in PPO2

Open Hardlygo opened this issue 3 years ago • 0 comments

I think that the advantage value here should be base on the old actor target_v = reward + args.gamma * self.critic_net(next_state)

Hardlygo avatar Jul 13 '21 08:07 Hardlygo