M. Ernestus comments

Results 89 comments of


                                            M. Ernestus

HuggingFace models outdated

I re-trained the experts for all the above mentioned envs (PPO and SAC where applicable). We can now specify the normalization like this: ```python "seals/MountainCar-v0": dict( normalize=dict(norm_obs=False, norm_reward=True), policy_kwargs=dict( activation_fn=torch.nn.modules.activation.Tanh,...

[bug] PPO2 episode reward summaries are written incorrectly for VecEnvs

I can confirm that it works with just one env. The relevant code is in [total_episode_reward_logger](https://github.com/hill-a/stable-baselines/blob/002fb35c43da441567946ad197f92946e4d9b99d/stable_baselines/a2c/utils.py#L562) which is called by PPO2 [here](https://github.com/hill-a/stable-baselines/blob/002fb35c43da441567946ad197f92946e4d9b99d/stable_baselines/ppo2/ppo2.py#L309) and . To me it is absolutely unclear...

M. Ernestus

HuggingFace models outdated

[bug] PPO2 episode reward summaries are written incorrectly for VecEnvs

LSTM policies are broken for PPO1 and TRPO

LSTM policies are broken for PPO1 and TRPO

Code Cleanup

Support asynchronous human preference gathering in RLHP implementation

Run time Error when run quickstart.py

Use data acquired by users

Use data acquired by users

Ensure safe_to_tensor moves tensors to the specified device.