pytorch-soft-actor-critic
pytorch-soft-actor-critic copied to clipboard
Unable to reproduce results on Humanoid-v2 in new SAC
I am unable to obtain the result as reported in the paper ‘Soft Actor-Critic Algorithms and Applications ’ on the openai environment Humanoid-v2. The result is 6000 while the original paper is 8000, after 10million steps.
Do you know what might be causing this issue? Thank you!
Hmm.....
Don't know why this would happen, although I have never tested on humainoid for 10 million steps.
The result 8000, on humanoid, is for learned temperature (alpha).
For fixed alpha, I think, the result 6000 is alright.
I don't know if you changed the argument automatic_entropy_tuning
to True (by default it is False). For --automatic_entropy_tuning = False
6000 is the expected result.
I just ran Humanoid for 10 million steps, and unfortunately cannot reproduce the problems you're observing.
Here are the results I see across 2 seeds:
Maybe there's something different with the arguments or environment you used? Also, which mujoco version are you using? P.S. running the env for 10 million steps twice costs a lot :stuck_out_tongue_closed_eyes:. But I will run again if you can be more specific :grimacing:.
Thank you very much!!! The parameter setting of the experiment refers to the original code. Namespace(alpha=0.2, automatic_entropy_tuning=True, batch_size=256, env_name='Humanoid-v2', eval=True, gamma=0.99, hidden_size=256, lr=0.0003, num_steps=10000001, policy='Gaussian', replay_size=1000000, seed=0, start_steps=10000, target_update_interval=1, tau=0.005, updates_per_step=1). The version of GYM is '0.14.0' and mujoco_py is '1.50.1.68'. The mujoco physical engine version is 150.
Thank you very much!!! The parameter setting of the experiment refers to the original code. Namespace(alpha=0.2, automatic_entropy_tuning=True, batch_size=256, env_name='Humanoid-v2', eval=True, gamma=0.99, hidden_size=256, lr=0.0003, num_steps=10000001, policy='Gaussian', replay_size=1000000, seed=0, start_steps=10000, target_update_interval=1, tau=0.005, updates_per_step=1). The version of GYM is '0.14.0' and mujoco_py is '1.50.1.68'. The mujoco physical engine version is 150.
I use almost the same parameters except that "--automatic_entropy_tuning = True" for 10 million steps, and I got the following result:
I just ran the experiment once. But I could't reproduce the score of 8000, either. Would @pranz24 mind share the parameters? My version of GYM is 0.10.9 and mujoco_py is 1.50.1.68. The mujoco physical engine version is 200.
For
--automatic_entropy_tuning = False
6000 is the expected result.
For fixed temperature, the results should be around 6000. You can also check this in the paper
Hi, I have the same problems. I run the code with automatic_entropy_tuning = True, but the result is still around 6000. Would you mind share the running config for you curve? Thank you very much.