adventures-in-ml-code Policy Gradient REINFORCE algorithm not converging.

Policy Gradient REINFORCE algorithm not converging.

Open padmaja-kulkarni opened this issue 5 years ago • 0 comments

First of all, thank you for the tutorial here!

I am trying to implement/run your code mentioned in the tutorial, however, the results are not converging after 500 steps as shown in the image 'Reward: Training progress of Policy Gradient RL in Cartpole environment". Even after 5000 steps, the reward is around 10. Is this correct?

Thanks again!

May 16 '20 21:05 padmaja-kulkarni

adventures-in-ml-code adventures-in-ml-code copied to clipboard

Policy Gradient REINFORCE algorithm not converging.

adventures-in-ml-code
adventures-in-ml-code copied to clipboard