Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

Multiprocessing support for HerReplayBuffer

> It would be implemented to allow to load models trained with an old version of SB3? Yes, but probably for a separate PR. > The argument will still exist...

Multiprocessing support for HerReplayBuffer

> Because the timeout is handled at sampling time for the classic replay buffer (normally). But we can probably do the same for the online sampling.

Multiprocessing support for HerReplayBuffer

> I have one major concern though: the code is too slow! That's because you removed the max episode length, right? Btw, was that needed? Now the implementation seems really...

Multiprocessing support for HerReplayBuffer

> (episode_idx, trans_idx, env_idx), then you would have to manage a list of indexes of transitions and episode for each environment. It seems pretty complicated. I didn't even try. Actually,...

Multiprocessing support for HerReplayBuffer

> Let me check the current code, we might still reference the more consistent implementation for people interested but I'm not sure we will keep it mainly for the two...

Multiprocessing support for HerReplayBuffer

Looking at the test, the +1 for the future strategy does actually make a big difference? (the performance test was failing before even with almost twice more budget)

Multiprocessing support for HerReplayBuffer

> Btw, there is no reason that online sampling gives better results. There is. In fact, in my experience, the two are not equivalent (and the online sampling usually but...

Multiprocessing support for HerReplayBuffer

> But, as @araffin suspected, it does not work when it is a SubprocVecEnv.. yes, because the `compute_reward` function is in another process, not accessible directly from the main one....

Multiprocessing support for HerReplayBuffer

> Maybe it is just a statistical effect. Most probable yes. I would also do some testing with harder env (cf. the RL Zoo with highway env and then other...

[Feature Request] API for using custom distributions with existing policies

hello, sorry for the late reply (I'm on holidays, i will try to write a longer answer next week), i know that our current architecture is not really flexible (that...