Antonin RAFFIN comments

Results 769 comments of


                                            Antonin RAFFIN

Prioritized Experience Replay for DQN

> How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment...

[Question] Wrong scaled_action for continuous actions in `_sample_action()`?

Hello, > As the comments indicate, the continuous actions obtained in line 395 should have already been scaled by tanh, which puts them in the range (-1, 1). i think...

[Question] Wrong scaled_action for continuous actions in `_sample_action()`?

> I wonder why we need to store the unscaled action in the replay buffer instead of the final action actually taken in the environment. we need to store the...

[Question] Wrong scaled_action for continuous actions in `_sample_action()`?

> In particular, why multiply by 2 and subtract 1? how would you do it otherwise?

Stuck when training with DDPG

Hello, is the simulation asynchronous? Please fill the custom gym env template completely. If you want a full video serie of SB3 and car racing (with open source code), you...

Added missing metrics when logging on tensorboard (#1298)

@timothe-chaumont could you review/test this one?

Added missing metrics when logging on tensorboard (#1298)

@timothe-chaumont thanks for reviewing. > Modify the HParam class to ask for metrics names only (without values): yes, look like a better fix, but `hparams` is asking for a non-empty...

Add next_observations and dones to RolloutBuffer

> [ ] I have raised an issue to propose this change ([required](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) for new features and bug fixes)

[Feature Request] Env checker for VecEnv

Hello, > but I think it would be a good idea to allow to check the correctness of vectorized envs too. yes, would be a good idea =) > but...

Assert that the VecNormalize wrapper handles the new truncations correctly

Thanks for opening the issue =) After thinking more about it, I think the current implementation is correct: we reset the episodic return when a done signal is received. An...