Antonin RAFFIN

Results 880 comments of Antonin RAFFIN

Hello, That would indeed be a good addition for SB3 contrib, would you be willing to contribute?

Hello, thanks for the PR =) > The functionality/performance matches that of the source (required for new training algorithms or training-related features). please don't forget that part (see contributing guide)....

> In some rare cases, (encountered once) noise sampling in gSDE can break. i think we need to activate `use_expln=True`, this should prevent log std explosion, or use `AdamW` yes.

Hello, > I'm using MaskablePPO on a powerful computer but the speed of the training doesn't change compared to a normal computer. Is there any option or line of code...

> I propose compatibility with other RL algorithms beyond PPO, specifically A2C. A2C is already included by the recurrent PPO implementation: https://arxiv.org/abs/2205.09123 > introducing various recurrent neural networks (RNNs) like...

Hello, this would be definitely a good addition to SB3 contrib. Make sure to read the contributing guide carefully. You might have a look at R2D2 paper (https://paperswithcode.com/method/r2d2) and https://github.com/zhihanyang2022/off-policy-continuous-control....

Hello, thanks for reporting the updated results =). Do you have a diagram to share for RSAC vs RSAC_s maybe? (that would make things easier to discuss) Di you also...

> As you can see, RSAC_S share the RNN state between the actor and the critic, but only actor can change the RNN state. Whereas in RSAC actor and critics...

i can help you with that, the continuous version has a deceptive reward and need quite some exploration noise EDIT: working hyperparameters: https://github.com/DLR-RM/rl-baselines3-zoo/blob/8cecab429726d7e6aaebd261d26ed8fc23b7d948/hyperparams/sac.yml#L2 or https://github.com/DLR-RM/rl-baselines3-zoo/blob/8cecab429726d7e6aaebd261d26ed8fc23b7d948/hyperparams/td3.yml#L5-L6 (note: the gSDE exploration is...