imitation Change adversarial algorithms to collect rollouts first

Change adversarial algorithms to collect rollouts first

Open taufeeque9 opened this issue 2 years ago • 2 comments

Description

This PR changes the adversarial algorithm such that the at any iteration, the rollouts are collected first, and then the discriminator is trained, followed by training the generator. This modification matches Algorithm 1 given in the AIRL paper.

Testing

The proposed change improves the returns obtained on many environments. The table below shows the imitation-to-expert return ratio of the algorithms on several environments. The results were obtained by tuning the hyperparameters for each environment separately. The return ratio was obtained by evaluating the tuned hyperparameters on five distinct seeds and calculating the average return ratio to the expert's return.

Algo \ Env	Ant	Half Cheetah	Hopper	Swimmer	Walker
GAIL-PR	0.883	0.868	1.01	0.986	0.989
AIRL-PR	-0.04	0.993	1.01	0.926	0.270
GAIL-Master	0.864	0.981	1.004	0.945	0.893
AIRL-Master	0.259	0.447	1.008	0.663	0.176

Jun 17 '23 18:06 taufeeque9

Thanks a lot @taufeeque9 for adding this change. We will need it for #675 !

For my understanding: does the table show comparisons to the previous version of the implementation?

Jun 19 '23 08:06 ernestum

The table shows comparisons with the current version of the algorithm implemented on the master branch, which hasn't been updated since I last computed the results. -PR indicates the modified algorithm implemented in this PR and -Master indicates the algorithm implemented currently on the master branch.

Jun 19 '23 09:06 taufeeque9

imitation imitation copied to clipboard

Change adversarial algorithms to collect rollouts first

Description

Testing

imitation
imitation copied to clipboard