M. Ernestus

Results 89 comments of M. Ernestus

I re-trained the experts for all the above mentioned envs (PPO and SAC where applicable). We can now specify the normalization like this: ```python "seals/MountainCar-v0": dict( normalize=dict(norm_obs=False, norm_reward=True), policy_kwargs=dict( activation_fn=torch.nn.modules.activation.Tanh,...

I can confirm that it works with just one env. The relevant code is in [total_episode_reward_logger](https://github.com/hill-a/stable-baselines/blob/002fb35c43da441567946ad197f92946e4d9b99d/stable_baselines/a2c/utils.py#L562) which is called by PPO2 [here](https://github.com/hill-a/stable-baselines/blob/002fb35c43da441567946ad197f92946e4d9b99d/stable_baselines/ppo2/ppo2.py#L309) and . To me it is absolutely unclear...

Any hints for what exactly the masks are used for? This would help a lot!

PPO2 works if we fix the issue with the number of minibatches.

Ok so I will wait for @Miffyli to not make any work in #540 useless?

@rk1a thanks for this huge chuck of work! We are excited to see how we can upstream it. Right now we are focused on releasing a v1.0 of `imitation` by...

Thanks for reporting this. See #823 for the warning.

I think the idea of an interactive policy is worth exploring. Maybe the "polling" mechanism won't work for all scenarios because if the polling is interleaved with some learning process,...

Sounds good. I am excited to see what comes out of this exploration!

I think the coverage warning in this one is spurious. Can you merge this @AdamGleave ?