ppo
ppo copied to clipboard
BreakoutNoFrameskip-v4 does not converge
The final mean reward is only around 40, and it oscillates a lot.
Hi, I find another confusion in rlsaber.trainer. In class Trainer, the loop will break if the game is done and reset, but in the class BatchTrainer, when an env is done, I did not find the reset op, it is weird.
@initial-h Hi! Thank you for your another question. For the performance instability issue, could you try different random seed or increase the number of actors? It may improve stability.
BatchTrainer uses BatchEnvWrapper included in rlsaber. The reset function is actually called here. If you still have any questions, please feel free to ask me :)
Thank you for your reply. It can work well on Pong, but still have no effect on Breakout. I just want to write a ppo and test some games. I write a similar one based on your code, there is no bug but it does not work. The reward does not increase. So, I want to ask if there are some tricks I should take care? Thanks.