Reinforcement-Learning
Reinforcement-Learning copied to clipboard
Accuracy improvement possible?
I'm running the code verbatim but not finding the results which might be expected. For example, running ping_pong_a2c
results in barely any improvement after more than 8,000 runs, while I would expect a good level of accuracy (at least > 0 score) by 5,000 iterations or so based on other people reporting results based on using RL with Atari/Pong.
Is there something I'm missing? Do the hyperparameters need to be tuned rather than run as is?
Thank you for creating the code base.
No, it does not converge. I spent days on this code to debug why but couldn't drill down to the exact issue. Use the openAi gym wrappers to manipulate the frames