adventures-in-ml-code
adventures-in-ml-code copied to clipboard
Policy Gradient REINFORCE algorithm not converging.
First of all, thank you for the tutorial here!
I am trying to implement/run your code mentioned in the tutorial, however, the results are not converging after 500 steps as shown in the image 'Reward: Training progress of Policy Gradient RL in Cartpole environment". Even after 5000 steps, the reward is around 10. Is this correct?
Thanks again!