pytorch-a2c-ppo-acktr-gail GAIL uses AIRL reward function

GAIL uses AIRL reward function

Open HareshKarnan opened this issue 5 years ago • 2 comments

trafficstars

I noticed that the predict reward function uses log(D(.)) - log(1-D(.)) as the reward to update the generator. However, this is the reward function proposed in the AIRL paper which minimizes the reverse KL divergence instead of JS divergence as in GAIL. is it common for implementations to swap out the GAIL loss with AIRL loss ?

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/84a7582477fb0d5c82ad6d850fe476829dddd2e1/a2c_ppo_acktr/algo/gail.py#L103

May 14 '20 06:05 HareshKarnan

I am also confused, what should I do if I just wanna GAIL loss? just reward = - (1 - s).log()

Oct 22 '20 12:10 hrwise-nlp

if we look at the algorithm section for GAIL, the proposed loss function is log(D(.)) so just use that. For stability reasons, add 1e-8 inside the log term, like : log(D(.) + 1e-8) to ensure you dont get a huge negative reward when output of the discriminator is zero.

You can also try -log(1-D(.) + 1e-8) [the alternative GAN loss]

Nov 02 '20 02:11 HareshKarnan

pytorch-a2c-ppo-acktr-gail pytorch-a2c-ppo-acktr-gail copied to clipboard

GAIL uses AIRL reward function

pytorch-a2c-ppo-acktr-gail
pytorch-a2c-ppo-acktr-gail copied to clipboard