Antonin RAFFIN

Results 880 comments of Antonin RAFFIN

@skervim well, I don't know as I'm not in charge if implementing it nor testing it. However that does not mean you cannot test it before (cf install from source...

@skervim if you want to test on continuous envs for free (no mujoco licence required), I recommend you the pybullets envs (see the [rl baselines zoo](https://github.com/araffin/rl-baselines-zoo))

Hello, >It seems to me that when HER samples an achieved goal from the replay buffer it never samples the very last state of the episode. https://github.com/hill-a/stable-baselines/blob/4fada47f1b71b7548c935b1f01c6fb04199b3d54/stable_baselines/her/replay_buffer.py#L113 the index `[-1]`...

Thanks for the clarification. For #578 , it seems normal for the `future` strategy (cf answer: https://github.com/hill-a/stable-baselines/issues/578#issuecomment-581178005) For the rest, I need to think more about it.

Hello, Maybe a duplicate of https://github.com/hill-a/stable-baselines/issues/501 But really sounds like a bug

>new_tb_log==False here does not work? There is an issue about that: https://github.com/hill-a/stable-baselines/issues/599#issuecomment-561709799

Hello, it sounds like you should take a look at @AdamGleave work (based on stable-baselines): https://github.com/HumanCompatibleAI/adversarial-policies

Hello, thanks for the PR, please fill the PR template completely.

This is a breaking change, and I would change DDPG/SAC/TD3 for consistency then so we can fix #526 EDIT: layers should be [] by default in the case of a...

>which is cleaner/ to implement? @Miffyli I don't have much time for that issue right now, I trust you to take the right decision ;) (unless you really want my...