Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

Beta distribution as policy for environments with bounded continuous action spaces [feature request]

@skervim well, I don't know as I'm not in charge if implementing it nor testing it. However that does not mean you cannot test it before (cf install from source...

Beta distribution as policy for environments with bounded continuous action spaces [feature request]

@skervim if you want to test on continuous envs for free (no mujoco licence required), I recommend you the pybullets envs (see the [rl baselines zoo](https://github.com/araffin/rl-baselines-zoo))

[question] HER does not sample very last state of episode as achieved goal

Hello, >It seems to me that when HER samples an achieved goal from the replay buffer it never samples the very last state of the episode. https://github.com/hill-a/stable-baselines/blob/4fada47f1b71b7548c935b1f01c6fb04199b3d54/stable_baselines/her/replay_buffer.py#L113 the index `[-1]`...

[question] HER does not sample very last state of episode as achieved goal

Thanks for the clarification. For #578 , it seems normal for the `future` strategy (cf answer: https://github.com/hill-a/stable-baselines/issues/578#issuecomment-581178005) For the rest, I need to think more about it.

Should `TensorboardWriter` close its `tf.summary.FileWriter`?

Hello, Maybe a duplicate of https://github.com/hill-a/stable-baselines/issues/501 But really sounds like a bug

Should `TensorboardWriter` close its `tf.summary.FileWriter`?

>new_tb_log==False here does not work? There is an issue about that: https://github.com/hill-a/stable-baselines/issues/599#issuecomment-561709799

Using Saved Model as Enemy Policy in Custom Environment (while training in a subprocvecenv)

Hello, it sounds like you should take a look at @AdamGleave work (based on stable-baselines): https://github.com/HumanCompatibleAI/adversarial-policies

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms

Hello, thanks for the PR, please fill the PR template completely.

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms

This is a breaking change, and I would change DDPG/SAC/TD3 for consistency then so we can fix #526 EDIT: layers should be [] by default in the case of a...

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms

>which is cleaner/ to implement? @Miffyli I don't have much time for that issue right now, I trust you to take the right decision ;) (unless you really want my...