ppo icon indicating copy to clipboard operation
ppo copied to clipboard

BreakoutNoFrameskip-v4 does not converge

Open initial-h opened this issue 5 years ago • 3 comments
trafficstars

The final mean reward is only around 40, and it oscillates a lot.

initial-h avatar Feb 20 '20 01:02 initial-h

Hi, I find another confusion in rlsaber.trainer. In class Trainer, the loop will break if the game is done and reset, but in the class BatchTrainer, when an env is done, I did not find the reset op, it is weird.

initial-h avatar Feb 20 '20 02:02 initial-h

@initial-h Hi! Thank you for your another question. For the performance instability issue, could you try different random seed or increase the number of actors? It may improve stability.

BatchTrainer uses BatchEnvWrapper included in rlsaber. The reset function is actually called here. If you still have any questions, please feel free to ask me :)

takuseno avatar Feb 23 '20 14:02 takuseno

Thank you for your reply. It can work well on Pong, but still have no effect on Breakout. I just want to write a ppo and test some games. I write a similar one based on your code, there is no bug but it does not work. The reward does not increase. So, I want to ask if there are some tricks I should take care? Thanks.

initial-h avatar Mar 02 '20 14:03 initial-h