ngsim_env icon indicating copy to clipboard operation
ngsim_env copied to clipboard

PPO instead of TRPO

Open Kailiangdong opened this issue 5 years ago • 1 comments

Hello, thank you for sharing your code. May I ask a paper question? Since ppo is the upgrade of trpo. Have you considered to use ppo instead of trpo? I am facing this question in my thesis. I wonder why all lastet GAIL paper still use trpo.

Thank you very much.

Kailiangdong avatar Nov 12 '19 15:11 Kailiangdong

Great question! The main reason for using TRPO was that the original GAIL paper used TRPO (see Algorithm 1 in the paper). As you mention, PPO is an innovation over TRPO by invoking importance sampling. I think it will be a great idea to investigate the impact of replacing TRPO by PPO in the GAIL learning loop.

raunakbh92 avatar Nov 12 '19 17:11 raunakbh92