Antonin RAFFIN

Results 880 comments of Antonin RAFFIN

> I have raised an issue to propose this change ([required](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) for new features and bug fixes) this step is important to discuss the issue and see if a fix/feature...

Hello, that's correct because there is current only `RecurrentPPO` that make use of `states` (LSTM states) and episode starts (to reset the states).

To continue the discussion from SB3 issue, I had a closer look at GRPO. The algorithm was actually described in https://arxiv.org/abs/2402.03300 and seems to be specific to LLM training (which...

Hello, thanks for the proposal, what is missing currently? where would you need some help?

Hello, SBX has a limited support for it (when all the items are boxes). > do you have this feature in your roadmap? If yes when is it expected to...

Hello, could you provide the error message too?

Do you see any difference in the filenames/logs compared to SB3

might be related to https://github.com/DLR-RM/stable-baselines3/issues/2061 help is welcomed to solve the issue =)

@OliverUrbann could you try with https://github.com/DLR-RM/stable-baselines3/pull/2063 ? it might solve your issue

thanks for trying =) I've dig more into the issue and I think I found the root cause. The problem comes from W&B client: https://github.com/wandb/wandb/blob/8dd25cab52da3603022e75322c847de4def21b1c/wandb/integration/gym/__init__.py#L68 With Gymnasium v1.0, the previous...