awr icon indicating copy to clipboard operation
awr copied to clipboard

Parameters used for motion imitation

Open ManifoldFR opened this issue 4 years ago • 6 comments

Hello,

I am trying to use this algorithm (rewritten in PyTorch with Gym vectorized envs) for motion imitation, starting with the PyBullet implementation of the DeepMimic environment. In the paper, section 5.3, there is a comparison of DeepMimic's modified off-policy PPO with AWR and RWR on some of DeepMimic's tasks, but no further information was given on which hyperparameters were used there.

The appendix gives some parameters which I think apply to the usual MuJoCo benchmarks, but I'm not sure if they also apply to the DeepMimic tasks (for instance the MLP hidden dimensions of (128, 64) don't seem right for DeepMimic since the original paper uses (1024, 512)).

ManifoldFR avatar Jun 06 '20 08:06 ManifoldFR

Sure, here're the hyperparameters for the motion imitation tasks with the humanoid:

"actor_net_layers": [1024, 512], "actor_stepsize": 0.0000015, "actor_momentum": 0.9, "actor_init_output_scale": 0.01, "actor_batch_size": 256, "actor_steps": 200, "action_std": 0.05,

"critic_net_layers": [1024, 512], "critic_stepsize": 0.01, "critic_momentum": 0.9, "critic_batch_size": 256, "critic_steps": 100,

"discount": 0.95, "samples_per_iter": 4096, "replay_buffer_size": 50000, "normalizer_samples": 1000000,

"weight_clip": 50, "td_lambda": 0.95, "temp": 1.0,

xbpeng avatar Jun 06 '20 17:06 xbpeng

Thanks! I also have another couple of questions: were actions normalized like in the original DeepMimic code, and was MPI used to speed up data collection and train agents?

ManifoldFR avatar Jun 07 '20 11:06 ManifoldFR

yes, actions were also normalized. Besides using AWR instead of PPO, the rest of the setup was the same.

xbpeng avatar Jun 07 '20 15:06 xbpeng

In the paper's appendix C it is said a temperature of 0.05 is used with step size 0.00005, though the config file in this repo sets the temperature to 1.0 and changes the learning rates -- which one should be used ? I can see where the tradeoff happens with this parameter: in my experiments, adjusting it made a difference between being able to train on an environment or not

ManifoldFR avatar Jun 07 '20 21:06 ManifoldFR

In the code we are using advantage normalization, so the temperature is just set to 1.0. The temp of 0.05 was used without advantage normalization. If you are using the code, a temp of 1 should work for the tasks.

xbpeng avatar Jun 07 '20 21:06 xbpeng

Thanks ! I'm interested in how the temperature and weight clip interact: I guess having a lot of weights clipped should be bad news, right? because intuitively if half of the weights are set to 20 then you lose info on the relative quality of the corresponding actions in the gradient -- perhaps I'll look into it.

ManifoldFR avatar Jun 07 '20 21:06 ManifoldFR