Antonin RAFFIN
Antonin RAFFIN
Beta distribution as policy for environments with bounded continuous action spaces [feature request]
@skervim well, I don't know as I'm not in charge if implementing it nor testing it. However that does not mean you cannot test it before (cf install from source...
Beta distribution as policy for environments with bounded continuous action spaces [feature request]
@skervim if you want to test on continuous envs for free (no mujoco licence required), I recommend you the pybullets envs (see the [rl baselines zoo](https://github.com/araffin/rl-baselines-zoo))
Hello, >It seems to me that when HER samples an achieved goal from the replay buffer it never samples the very last state of the episode. https://github.com/hill-a/stable-baselines/blob/4fada47f1b71b7548c935b1f01c6fb04199b3d54/stable_baselines/her/replay_buffer.py#L113 the index `[-1]`...
Thanks for the clarification. For #578 , it seems normal for the `future` strategy (cf answer: https://github.com/hill-a/stable-baselines/issues/578#issuecomment-581178005) For the rest, I need to think more about it.
Hello, Maybe a duplicate of https://github.com/hill-a/stable-baselines/issues/501 But really sounds like a bug
>new_tb_log==False here does not work? There is an issue about that: https://github.com/hill-a/stable-baselines/issues/599#issuecomment-561709799
Hello, it sounds like you should take a look at @AdamGleave work (based on stable-baselines): https://github.com/HumanCompatibleAI/adversarial-policies
Hello, thanks for the PR, please fill the PR template completely.
This is a breaking change, and I would change DDPG/SAC/TD3 for consistency then so we can fix #526 EDIT: layers should be [] by default in the case of a...
>which is cleaner/ to implement? @Miffyli I don't have much time for that issue right now, I trust you to take the right decision ;) (unless you really want my...