REINFORCE on CartPole-v0 | Chan`s Jupyter
REINFORCE on CartPole-v0 | Chan`s Jupyter
In this post, We will take a hands-on-lab of Monte Carlo Policy Gradient (also known as REINFORCE) on openAI gym CartPole-v0 environment. This is the coding exercise from udacity Deep Reinforcement Learning Nanodegree.
Nice writeup! I've copied some of this code down to try to reproduce the results but it doesnt seem to be learning very well like you have shown here. I have minimally modified the code, though some things were edited since it was throwing errors... maybe because gym has had updates since.
Hi, Thanks for the comment. Yes, this example is based on legacy version of OpenAI Gym. I plan to migrate this notebook into new webpage(kcsgoodboy.github.io), with package update.
Thanks for the interest.
Most welcome, thanks for revisiting such a old project.
The code in the attached collab notebook actually worked well. So i was super confused how i was getting different results.
Turns out the issue is because I had copied some things wrong, still unsure how that happened. Maybe i was blindly copying chatgpt.
I also rolled back to the same versions of torch and gym used in collab.
- gym 0.26.2 doesnt work
- gym 0.25.2 works
Different note: gymnasium should be back compatible with gym. At least for this example this isnt true. seems like the state return value is a tuple instead of a numpy array in gymnasium which causes issues with how u do the conversion into tensor for the forward pass.