imitation
imitation copied to clipboard
Change adversarial algorithms to collect rollouts first
Description
This PR changes the adversarial algorithm such that the at any iteration, the rollouts are collected first, and then the discriminator is trained, followed by training the generator. This modification matches Algorithm 1 given in the AIRL paper.
Testing
The proposed change improves the returns obtained on many environments. The table below shows the imitation-to-expert return ratio of the algorithms on several environments. The results were obtained by tuning the hyperparameters for each environment separately. The return ratio was obtained by evaluating the tuned hyperparameters on five distinct seeds and calculating the average return ratio to the expert's return.
| Algo \ Env | Ant | Half Cheetah | Hopper | Swimmer | Walker |
|---|---|---|---|---|---|
| GAIL-PR | 0.883 | 0.868 | 1.01 | 0.986 | 0.989 |
| AIRL-PR | -0.04 | 0.993 | 1.01 | 0.926 | 0.270 |
| GAIL-Master | 0.864 | 0.981 | 1.004 | 0.945 | 0.893 |
| AIRL-Master | 0.259 | 0.447 | 1.008 | 0.663 | 0.176 |
Thanks a lot @taufeeque9 for adding this change. We will need it for #675 !
For my understanding: does the table show comparisons to the previous version of the implementation?
The table shows comparisons with the current version of the algorithm implemented on the master branch, which hasn't been updated since I last computed the results. -PR indicates the modified algorithm implemented in this PR and -Master indicates the algorithm implemented currently on the master branch.