Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard
I dont think PPO pendulum is converging
Yes, the problem is that the activation function is chosen incorrectly.
I don't think this repo implement the PPO correctly either
change the activation function relu to tanh
right,change relu to tanh in actor network