Alexander Nikulin comments

Results 47 comments of


                                            Alexander Nikulin

ppo with timeout handling

@vwxyzjn Hi! Seems like the API change is in the latest gym release. Should I try to update this PR to the new API now?

Adding Average Reward PPO proposal

Yes, but it will take some time, especially documentation and testing. I think it would be reasonable to start from `apo_continuous_action.py` and compare it only with `ppo_continuous_action.py` as a test.

Removing the regular advantage calculation in PPO

Yup, this can be also easily verified from the GAE computation formula, with `gae_lambda=1` all terms besides needed for `n_step_returns` will cancel out

PPO timeout proper handling

Thanks for the link! I'll take a look and see what I can come up with.

PPO timeout proper handling

After some thinking and sketching on a piece of paper, it seems to me that it could be solved this way (just proposal for now): ```python # define buffer's here:...

PPO timeout proper handling

After thinking a bit more, it seems that my implementation is not correct because it will not correctly account for the done flag in GAE (`last_gae_lam` in example 2 from...

wrong rendering

I have gym.ObservationWrapper which only concats state dim's in one vector (names are in obs_dict below), other methods are not overloaded. That's how I create env: ```python def create_env(hdf5_path="data/lift/ph/low_dim.hdf5", render=False):...

wrong rendering

Yes, it is definitely set to False, I even make an assert on that.

WIP: SAC-discrete implementation

@timoklein Sometimes target entropy maybe just very high and hard to reach and the loss can explode (as alpha will grow and grow), so usually I tune a bit coefficient...

Roadmap for D4RL

Would it be possible to also remove the dependency on mujoco_py (like it was done in gym as far as I know)? Since downloading mujoco manually every time is extremely...