ppo BreakoutNoFrameskip-v4 does not converge

BreakoutNoFrameskip-v4 does not converge

Open initial-h opened this issue 5 years ago • 3 comments

trafficstars

The final mean reward is only around 40, and it oscillates a lot.

Feb 20 '20 01:02 initial-h

Hi, I find another confusion in rlsaber.trainer. In class Trainer, the loop will break if the game is done and reset, but in the class BatchTrainer, when an env is done, I did not find the reset op, it is weird.

Feb 20 '20 02:02 initial-h

@initial-h Hi! Thank you for your another question. For the performance instability issue, could you try different random seed or increase the number of actors? It may improve stability.

BatchTrainer uses BatchEnvWrapper included in rlsaber. The reset function is actually called here. If you still have any questions, please feel free to ask me :)

Feb 23 '20 14:02 takuseno

Thank you for your reply. It can work well on Pong, but still have no effect on Breakout. I just want to write a ppo and test some games. I write a similar one based on your code, there is no bug but it does not work. The reward does not increase. So, I want to ask if there are some tricks I should take care? Thanks.

Mar 02 '20 14:03 initial-h

ppo ppo copied to clipboard

BreakoutNoFrameskip-v4 does not converge

ppo
ppo copied to clipboard