pytorch-a2c-ppo-acktr-gail icon indicating copy to clipboard operation
pytorch-a2c-ppo-acktr-gail copied to clipboard

I couldn't get good result for GAIL in any environments except HalfCheetah.

Open slee01 opened this issue 5 years ago • 3 comments

Hi, first of all, thank you for sharing your code.

I've been trying to implement GAIL using expert demonstrations from your Google Drive. I used the hyper-parameters from gail_experts/readme and I got good result from HalfCheetah. But, I got bad result than I expected from others such as Hopper, Ant, Walker2d(I coudn't test for Reacher. I guess the expert data, which is only 240KB has some problem.) I tried again with different hyper-parameters including seed, but unfortunately still got the same result. So could you share the parameters you used for these environments I failed? It would help comparison test for my research a lot.

slee01 avatar Aug 30 '19 07:08 slee01

For the moment, the easiest way to fix the problem is to change the reward function and turn normalization off: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/algo/gail.py#L98

See the comments here: https://github.com/openai/imitation/blob/99fbccf3e060b6e6c739bdf209758620fcdefd3c/policyopt/imitation.py#L146

You need to use this reward specifically:

rewards_B = -tensor.log(1.-tensor.nnet.sigmoid(scores_B))

ikostrikov avatar Sep 01 '19 16:09 ikostrikov

This was very helpful to me.

I figured out the standard deviation of reward from discriminator is much higher than that from mujoco simulators.

I also understood that the reward range should be different depending on the episode end option.

I finally got good results after modified the reward function.

But I'm not sure why the value network can be trained without reward normalization.

And I'm wondering that there is some reason why you normalize the reward from discriminator knowing the standard deviation of that reward is too high.

I think clipping is more proper than normalization for the reward function in discriminator.

Could you comment on these questions, please?

Thanks!

slee01 avatar Sep 02 '19 11:09 slee01

hi, I meet similar problem, my results is always bad in the GAIL. Can you share your experiences on this problem in detail? Thank you very much!

wang88256187 avatar Nov 28 '19 09:11 wang88256187