Alexander Nikulin
Alexander Nikulin
@vwxyzjn While it will be more explicit, I think we should respect the choice of the author of this method, since he called it APO. > try run python apo_continuous_action.py...
No, for now I vary gae-lambda as my primary goal is to replicate results from paper, not to properly compare with PPO (but I will come to that). All other...
So, the short conclusion from first experiments on Swimmer, HalfCheetah, Ant, Walker, Hopper: 1. I can match paper performance on Swimmer, HalfCheetah, Ant 2. While on Walker and Hopper performance...
I can restore the original quality on Hopper and Walker, but with quite specific parameters. I don't think it's worth it, given that the results on these environments are only...
> What is this? Is this an option in the gym environment? Yeah, this option disables `done=True` on unsafe states like falling, such that done will be only on timelimit....
I also did an experiment on 3 seeds, 2M steps on HalfCheetah-v3 for comparison, graphs seems ok to me for now wandb graphs: https://wandb.ai/howuhh/cleanrlPPO ![W B Chart 19 06 2022,...
@vwxyzjn username is the same: howuhh (link to wandb pofile above), also sent the request on discord
Also I am a bit skeptical about separate file for this, maybe I should add it to the base ppo but with flag to disable it (on the other hand,...
@vwxyzjn I agree that it will save us a lot of work if these changes actually happen. However, right now I don't see a consensus on the new API tho
@vwxyzjn it's ok, let's wait for new API