pytorch-soft-actor-critic icon indicating copy to clipboard operation
pytorch-soft-actor-critic copied to clipboard

Unable to reproduce results on Humanoid-v2 in new SAC

Open zwfightzw opened this issue 5 years ago • 6 comments

I am unable to obtain the result as reported in the paper ‘Soft Actor-Critic Algorithms and Applications ’ on the openai environment Humanoid-v2. The result is 6000 while the original paper is 8000, after 10million steps.

Do you know what might be causing this issue? Thank you!

zwfightzw avatar Aug 30 '19 02:08 zwfightzw

Hmm..... Don't know why this would happen, although I have never tested on humainoid for 10 million steps. The result 8000, on humanoid, is for learned temperature (alpha). For fixed alpha, I think, the result 6000 is alright. I don't know if you changed the argument automatic_entropy_tuning to True (by default it is False). For --automatic_entropy_tuning = False 6000 is the expected result.

pranz24 avatar Aug 30 '19 04:08 pranz24

I just ran Humanoid for 10 million steps, and unfortunately cannot reproduce the problems you're observing. Here are the results I see across 2 seeds: Screenshot from 2019-09-22 12-07-24

Maybe there's something different with the arguments or environment you used? Also, which mujoco version are you using? P.S. running the env for 10 million steps twice costs a lot :stuck_out_tongue_closed_eyes:. But I will run again if you can be more specific :grimacing:.

pranz24 avatar Sep 22 '19 06:09 pranz24

Thank you very much!!! The parameter setting of the experiment refers to the original code. Namespace(alpha=0.2, automatic_entropy_tuning=True, batch_size=256, env_name='Humanoid-v2', eval=True, gamma=0.99, hidden_size=256, lr=0.0003, num_steps=10000001, policy='Gaussian', replay_size=1000000, seed=0, start_steps=10000, target_update_interval=1, tau=0.005, updates_per_step=1). The version of GYM is '0.14.0' and mujoco_py is '1.50.1.68'. The mujoco physical engine version is 150.

zwfightzw avatar Sep 24 '19 00:09 zwfightzw

Thank you very much!!! The parameter setting of the experiment refers to the original code. Namespace(alpha=0.2, automatic_entropy_tuning=True, batch_size=256, env_name='Humanoid-v2', eval=True, gamma=0.99, hidden_size=256, lr=0.0003, num_steps=10000001, policy='Gaussian', replay_size=1000000, seed=0, start_steps=10000, target_update_interval=1, tau=0.005, updates_per_step=1). The version of GYM is '0.14.0' and mujoco_py is '1.50.1.68'. The mujoco physical engine version is 150.

I use almost the same parameters except that "--automatic_entropy_tuning = True" for 10 million steps, and I got the following result: image I just ran the experiment once. But I could't reproduce the score of 8000, either. Would @pranz24 mind share the parameters? My version of GYM is 0.10.9 and mujoco_py is 1.50.1.68. The mujoco physical engine version is 200.

qyz55 avatar Dec 16 '19 07:12 qyz55

For --automatic_entropy_tuning = False 6000 is the expected result.

For fixed temperature, the results should be around 6000. You can also check this in the paper

pranz24 avatar Dec 17 '19 05:12 pranz24

Hi, I have the same problems. I run the code with automatic_entropy_tuning = True, but the result is still around 6000. Would you mind share the running config for you curve? Thank you very much.

xfdywy avatar Oct 17 '20 03:10 xfdywy