Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard
DPPO not converging
I tried your DPPO algorithm with EP_MAX = 8000 and the total moving reward is not converging. Any Idea why ?