Antonin RAFFIN
Antonin RAFFIN
Hello, That would indeed be a good addition for SB3 contrib, would you be willing to contribute?
Hello, thanks for the PR =) > The functionality/performance matches that of the source (required for new training algorithms or training-related features). please don't forget that part (see contributing guide)....
yes please =)
> In some rare cases, (encountered once) noise sampling in gSDE can break. i think we need to activate `use_expln=True`, this should prevent log std explosion, or use `AdamW` yes.
Hello, > I'm using MaskablePPO on a powerful computer but the speed of the training doesn't change compared to a normal computer. Is there any option or line of code...
> I propose compatibility with other RL algorithms beyond PPO, specifically A2C. A2C is already included by the recurrent PPO implementation: https://arxiv.org/abs/2205.09123 > introducing various recurrent neural networks (RNNs) like...
Hello, this would be definitely a good addition to SB3 contrib. Make sure to read the contributing guide carefully. You might have a look at R2D2 paper (https://paperswithcode.com/method/r2d2) and https://github.com/zhihanyang2022/off-policy-continuous-control....
Hello, thanks for reporting the updated results =). Do you have a diagram to share for RSAC vs RSAC_s maybe? (that would make things easier to discuss) Di you also...
> As you can see, RSAC_S share the RNN state between the actor and the critic, but only actor can change the RNN state. Whereas in RSAC actor and critics...
i can help you with that, the continuous version has a deceptive reward and need quite some exploration noise EDIT: working hyperparameters: https://github.com/DLR-RM/rl-baselines3-zoo/blob/8cecab429726d7e6aaebd261d26ed8fc23b7d948/hyperparams/sac.yml#L2 or https://github.com/DLR-RM/rl-baselines3-zoo/blob/8cecab429726d7e6aaebd261d26ed8fc23b7d948/hyperparams/td3.yml#L5-L6 (note: the gSDE exploration is...