pytorch-a2c-ppo-acktr-gail icon indicating copy to clipboard operation
pytorch-a2c-ppo-acktr-gail copied to clipboard

GAIL uses AIRL reward function

Open HareshKarnan opened this issue 5 years ago • 2 comments
trafficstars

I noticed that the predict reward function uses log(D(.)) - log(1-D(.)) as the reward to update the generator. However, this is the reward function proposed in the AIRL paper which minimizes the reverse KL divergence instead of JS divergence as in GAIL. is it common for implementations to swap out the GAIL loss with AIRL loss ?

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/84a7582477fb0d5c82ad6d850fe476829dddd2e1/a2c_ppo_acktr/algo/gail.py#L103

HareshKarnan avatar May 14 '20 06:05 HareshKarnan

I am also confused, what should I do if I just wanna GAIL loss? just reward = - (1 - s).log()

hrwise-nlp avatar Oct 22 '20 12:10 hrwise-nlp

image

if we look at the algorithm section for GAIL, the proposed loss function is log(D(.)) so just use that. For stability reasons, add 1e-8 inside the log term, like : log(D(.) + 1e-8) to ensure you dont get a huge negative reward when output of the discriminator is zero.

You can also try -log(1-D(.) + 1e-8) [the alternative GAN loss]

HareshKarnan avatar Nov 02 '20 02:11 HareshKarnan