Antonin RAFFIN

Results 880 comments of Antonin RAFFIN

@DanielTakeshi >Out of curiosity do you have the exact commit for baselines here that corresponds to what stable-baselines uses? there was several changes/bug fixes afterward, but we forked it (apparently)...

>maybe something happened between then that caused changes in the environment processing code Maybe. In the stable-baselines code, there is no preprocessing for DDPG. > you report the "exploration policy"...

@DanielTakeshi I think your intuition was good: https://github.com/openai/baselines/blob/ba2b01782034bcbdb73a2e744cff4cb1c99ab612/baselines/run.py#L116 It seems the normalization is applied twice (and reward normalization is also active by default). Can you check commenting this line?

Hi @hartikainen , I finally managed to make it work on MountainCarContinuous by adding additional noise to the actions of the behavior policy, in the same fashion DDPG does it....

Update: DDPG seems to suffer from the same issue with sparse reward but the other way around: it work in the -1/0 setting and fails in the 0/1 one. Using...

Hello, You can find working hyperparameters in the [rl zoo](https://github.com/araffin/rl-baselines-zoo), the noise standard deviation is quite high (0.5 compared to "classic" values of 0.1-0.2 normally used)

>automatic ent_coef this is just for convenience, the external noise scale is what makes things work. >Did you mean MountainCarContinuous-v0 could be solved by SAC + HER? Ah no, I...

As discussed, that's ok when using "image net" normalisation, however, you should get better results with activation when using "tf" normalisation.

This is unexpected, but why not if the result is the same.

I couldn't reproduce your results... minimal code: ```python import numpy as np def prepro(x, mode='one'): x /= 255. if mode == 'one': x -= 0.5 x *= 2. else: x[...,...