REINFORCE on CartPole-v0 | Chan`s Jupyter

Open utterances-bot opened this issue 1 year ago • 3 comments

REINFORCE on CartPole-v0 | Chan`s Jupyter

In this post, We will take a hands-on-lab of Monte Carlo Policy Gradient (also known as REINFORCE) on openAI gym CartPole-v0 environment. This is the coding exercise from udacity Deep Reinforcement Learning Nanodegree.

https://goodboychan.github.io/python/reinforcement_learning/pytorch/udacity/2021/05/12/REINFORCE-CartPole.html

Jan 14 '25 04:01 utterances-bot

Nice writeup! I've copied some of this code down to try to reproduce the results but it doesnt seem to be learning very well like you have shown here. I have minimally modified the code, though some things were edited since it was throwing errors... maybe because gym has had updates since.

Jan 14 '25 04:01 homusman

Hi, Thanks for the comment. Yes, this example is based on legacy version of OpenAI Gym. I plan to migrate this notebook into new webpage(kcsgoodboy.github.io), with package update.

Thanks for the interest.

Jan 15 '25 03:01 goodboychan

Most welcome, thanks for revisiting such a old project.

The code in the attached collab notebook actually worked well. So i was super confused how i was getting different results.

Turns out the issue is because I had copied some things wrong, still unsure how that happened. Maybe i was blindly copying chatgpt.

I also rolled back to the same versions of torch and gym used in collab.

gym 0.26.2 doesnt work
gym 0.25.2 works

Different note: gymnasium should be back compatible with gym. At least for this example this isnt true. seems like the state return value is a tuple instead of a numpy array in gymnasium which causes issues with how u do the conversion into tensor for the forward pass.

Jan 16 '25 03:01 homusman