Youtube-Code-Repository Policy Gradient, SAC doesn't learn

Policy Gradient, SAC doesn't learn

Open Ling01234 opened this issue 1 year ago • 2 comments

Hi! I have a few more questions about the code that I don't quite get.

First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also dont see where its being used.

Second, I was getting really bad scores when i ran the code. I cloned the code from your git, and changed a few things as follows. The first thing I changed is the environment. More specifically, I changed it to env = gym.make("InvertedPendulum-v4") and as a result I also changed the following obs, _ = env.reset() and obs_, reward, done, *_ = env.step(action). Finally, I commented out the lines in sac_torch.py where we use the reparameterize=True since I ran into some nan Tensors when calling rsample().

That's all I've changed, and when I run the code, the score actually decreases (oddly enough). It starts with a score of approx 10 like a random agent, and decreases down to 3 or 4 after 250 episodes.

Would you have any idea of why this is happening? It would be so greatly appreciated!

Thanks a lot for your time

Apr 06 '23 02:04 Ling01234

Youtube-Code-Repository Youtube-Code-Repository copied to clipboard

Policy Gradient, SAC doesn't learn

Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard