Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

[Question] Using PPO1 on a cluster

> Yet it results in the same code run twice What do you mean? Sounds like it is expected, the synchronization is done when computing the gradients.

[Feature Proposal] Intrinsic Reward VecEnvWrapper

>the state of recurrent units such as LSTM are part of the environment (aka universe) state. I would rather say that the state of the LSTM, which is in fact...

[Feature Proposal] Intrinsic Reward VecEnvWrapper

> but would you still be interested in a PR once im done? This feature should be a `gym.Wrapper` so independent of the backend. >This could be added in the...

Does stable baselines provide an automatic way of computing the sample efficiency of an RL algorithm?

>could you please provide an example of how to compute the sample efficiency of an RL algorithm? It looks like both the `Monitor` wrapper and `EvalCallback` should do the trick...

When using HER + SAC, every call to learn massively decreases performance

Hello, Thanks for reporting the issue. I tried the following code (note the `learning_starts=0` to avoid wrong estimation of the FPS) ```python import time from stable_baselines import HER, SAC from...

When using HER + SAC, every call to learn massively decreases performance

I don't have the time to deal with this issue now, but you could use [line profiler](https://github.com/rkern/line_profiler) to check what is taking so much time.

When using HER + SAC, every call to learn massively decreases performance

Thanks @tirafesi , I assume that replacing the list-based replay buffer by numpy-based replay buffer would solve the issue... You have an example of it in the tf2 draft: https://github.com/Stable-Baselines-Team/stable-baselines-tf2/blob/master/stable_baselines/common/buffers.py...