Antonin RAFFIN
Antonin RAFFIN
> It should be possible to be generic enough to get rid of "reset", "render" and "seed", and "getattr" elif clauses. You are probably looking for `env_method` as done in...
> I think the current implementation has already done this: this is for data collection only, the reset should be done when updating the networks too.
Hello, I do agree for observation and goal and we can probably address it in https://github.com/DLR-RM/stable-baselines3/pull/704
Funny, I also recently gave it a try here: https://github.com/DLR-RM/stable-baselines3/tree/feat/n-steps
>The bug is basically the same as with memory optimization. :see_no_evil: Yep, I did only quick test with it and could not see any improvement yet. >Completed it. Removed loops,...
>Ps. sorry for autoformatting >.< ... I will do a PR soon for that ;) Apparently will be with black.
I'm currently given a quick try to that one (`feat/n-steps` in the zoo), and it already yields some interesting results with DQN on CartPole ;) And I couldn't notice any...
>Looks good! What about the FPS? It should have very small impact, but, there are still some optimizations that can be made. On DQN with CartPole, as mentioned, I couldn't...
>For sac, I am not sure if n-steps can be applied directly as I am under the impression that the backup requires the entropy for the intermediate states as well,...
I added a sketch of how it would look like for SAC, it fits in ~10 lines. We would need to allocate one more array `log_prob` of size `buffer_size` and...