Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

[Feature Request] Implement Recurrent SAC

> I am curious which gSDE ingredient does exactly help. The consistent exploration. To solve this task, you need to build-up momentum, having a bang-bang like strategy is one way...

[Feature Request] STAC algorithm

Hello, are you willing to implement and benchmark the algorithm?

[Feature Request] STAC algorithm

> The algorithm is an Off-policy one. Is there any way or example to begin with this kind of algorithms? https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/4 and please read https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/CONTRIBUTING.md

Recurrent PPO

Hello, the traceback is not complete. I would suspect that the problem might come from Kaggle notebook, there is probably a timeout.

Recurrent PPO

> I copied just the pertinent part of the log the traceback doesn't tell anything about why the process was terminated and nothing might relate it to SB3, it just...

Loading GPU trained RPPO on CPU

Hello, this is indeed a known problem but it should not prevent you from using it at test time: it is only a `UserWarning` not an error. Or do you...

Loading GPU trained RPPO on CPU

> Python file or kernel crashes a couple of seconds after UserWarning What is your PyTorch version? Could you try upgrading? > has stablebaselines3 version 1.8.0a4 and second machine has...

Loading GPU trained RPPO on CPU

Oh, I see, looks like a problem with PyTorch between OS. Can you confirm that you have no problem in case you save and load on the same machine?

[Question] Rewriting the Stable Baseline Docs with MkDocs with good UI and UX

Hello, do you already have a live draft somewhere? apart from the rst format, what are the main issues with the current documentation according to you?

[Feature Request] Domain Randomization

Hello, thanks for the suggestion, it is true that domain randomization is independent of the RL algorithm but in my mind, domain randomization is highly dependent on the environment, so...