Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard
Policy Gradient, SAC doesn't learn
Hi! I have a few more questions about the code that I don't quite get.
First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also dont see where its being used.
Second, I was getting really bad scores when i ran the code. I cloned the code from your git, and changed a few things as follows. The first thing I changed is the environment. More specifically, I changed it to env = gym.make("InvertedPendulum-v4")
and as a result I also changed the following obs, _ = env.reset()
and obs_, reward, done, *_ = env.step(action)
. Finally, I commented out the lines in sac_torch.py where we use the reparameterize=True since I ran into some nan Tensors when calling rsample().
That's all I've changed, and when I run the code, the score actually decreases (oddly enough). It starts with a score of approx 10 like a random agent, and decreases down to 3 or 4 after 250 episodes.
Would you have any idea of why this is happening? It would be so greatly appreciated!
Thanks a lot for your time