Alexander Nikulin comments

Results 47 comments of


                                            Alexander Nikulin

Average PPO implementation

@vwxyzjn While it will be more explicit, I think we should respect the choice of the author of this method, since he called it APO. > try run python apo_continuous_action.py...

Average PPO implementation

No, for now I vary gae-lambda as my primary goal is to replicate results from paper, not to properly compare with PPO (but I will come to that). All other...

So, the short conclusion from first experiments on Swimmer, HalfCheetah, Ant, Walker, Hopper: 1. I can match paper performance on Swimmer, HalfCheetah, Ant 2. While on Walker and Hopper performance...

Average PPO implementation

I can restore the original quality on Hopper and Walker, but with quite specific parameters. I don't think it's worth it, given that the results on these environments are only...

Average PPO implementation

> What is this? Is this an option in the gym environment? Yeah, this option disables `done=True` on unsafe states like falling, such that done will be only on timelimit....

ppo with timeout handling

I also did an experiment on 3 seeds, 2M steps on HalfCheetah-v3 for comparison, graphs seems ok to me for now wandb graphs: https://wandb.ai/howuhh/cleanrlPPO ![W B Chart 19 06 2022,...

ppo with timeout handling

@vwxyzjn username is the same: howuhh (link to wandb pofile above), also sent the request on discord

ppo with timeout handling

Also I am a bit skeptical about separate file for this, maybe I should add it to the base ppo but with flag to disable it (on the other hand,...

ppo with timeout handling

@vwxyzjn I agree that it will save us a lot of work if these changes actually happen. However, right now I don't see a consensus on the new API tho

ppo with timeout handling

@vwxyzjn it's ok, let's wait for new API