Deep-reinforcement-learning-with-pytorch icon indicating copy to clipboard operation
Deep-reinforcement-learning-with-pytorch copied to clipboard

About PPO

Open LpLegend opened this issue 4 years ago • 4 comments

I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward * 0.9 + score * 0.1'

LpLegend avatar Jan 20 '21 08:01 LpLegend

I have changed the activate function from relu to tanh, but there is nothing improvement.

LpLegend avatar Jan 20 '21 08:01 LpLegend

I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward * 0.9 + score * 0.1'

我也遇到这个问题,我咨询elegantrl作者,他说先tahn,再通过torch.distribution来sample action会影响信息熵,所以是没有办法收敛的,但是我不喜欢elegantrl的ppo写法,所以我还在找别人的代码

heyfavour avatar Jul 21 '21 08:07 heyfavour

Have you got the right code yet? Could you copy a link? Very appreciate!!

CoulsonZhao avatar Aug 25 '21 02:08 CoulsonZhao