Antonin RAFFIN

https://araffin.github.io/

@DLR-RM Munich Research Engineer in Robotics and Machine Learning, with a focus on Reinforcement Learning.

Results 880 comments of


                                            Antonin RAFFIN

DDPG implementation fails to learn well on at least five MuJoCo-v2 envs for all three noise types. I report steps to reproduce and learning curve plots [and show that PPO2 seems to work fine].

@DanielTakeshi >Out of curiosity do you have the exact commit for baselines here that corresponds to what stable-baselines uses? there was several changes/bug fixes afterward, but we forked it (apparently)...

DDPG implementation fails to learn well on at least five MuJoCo-v2 envs for all three noise types. I report steps to reproduce and learning curve plots [and show that PPO2 seems to work fine].

>maybe something happened between then that caused changes in the environment processing code Maybe. In the stable-baselines code, there is no preprocessing for DDPG. > you report the "exploration policy"...

DDPG implementation fails to learn well on at least five MuJoCo-v2 envs for all three noise types. I report steps to reproduce and learning curve plots [and show that PPO2 seems to work fine].

@DanielTakeshi I think your intuition was good: https://github.com/openai/baselines/blob/ba2b01782034bcbdb73a2e744cff4cb1c99ab612/baselines/run.py#L116 It seems the normalization is applied twice (and reward normalization is also active by default). Can you check commenting this line?

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward

Hi @hartikainen , I finally managed to make it work on MountainCarContinuous by adding additional noise to the actions of the behavior policy, in the same fashion DDPG does it....

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward

Update: DDPG seems to suffer from the same issue with sparse reward but the other way around: it work in the -1/0 setting and fails in the 0/1 one. Using...

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward

Hello, You can find working hyperparameters in the [rl zoo](https://github.com/araffin/rl-baselines-zoo), the noise standard deviation is quite high (0.5 compared to "classic" values of 0.1-0.2 normally used)

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward

>automatic ent_coef this is just for convenience, the external noise scale is what makes things work. >Did you mean MountainCarContinuous-v0 could be solved by SAC + HER? Ah no, I...

[bug report (serious)] All types of autoencoder and VAE have linear activation.

As discussed, that's ok when using "image net" normalisation, however, you should get better results with activation when using "tf" normalisation.

[feature improving request (srl_zoo)] Around 30% speed-up by changing few lines code

This is unexpected, but why not if the result is the same.

[feature improving request (srl_zoo)] Around 30% speed-up by changing few lines code

I couldn't reproduce your results... minimal code: ```python import numpy as np def prepro(x, mode='one'): x /= 255. if mode == 'one': x -= 0.5 x *= 2. else: x[...,...

‹
1
2
...
12
13
14
15
16
17
18
...
87
88
›