Antonin RAFFIN
Antonin RAFFIN
Hello, sorry for the delay. > demonstrated that replacing Gaussian with a Beta distribution Could you open a draft PR and run some benchmark? (checking that we can reproduce some...
Hello, thanks for the proposal, but I would actually prefer to have rainbow first, see https://github.com/DLR-RM/stable-baselines3/issues/622 and related PR like https://github.com/DLR-RM/stable-baselines3/pull/1622 We need help there, especially for benchmarking and having...
> Does that sound like a reasonable path going forward? this sounds reasonable =) Make sure to read the [contributing guide](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/CONTRIBUTING.md#pull-request-pr-and-review) to know what is expected. You might also have...
Hello, thanks for having a look at that. Apart from some tests failing, does the algorithms work in normal conditions? (for instance `PPO("MlpPolicy", "Pendulum-v1", device="mps").learn(10_000)`) (In theory, if pytorch supports...
I think most issues are related to numpy v2, and should be fixed in https://github.com/DLR-RM/stable-baselines3/pull/2041 too.
Hello, thanks for the proposal. > I acknowledge that GRPO was originally developed for LLM fine-tuning. My implementation elegantly uses seeds to enable sampling of multiple trajectories, Similar to what...
Hello, Unfortunately, I don't have any specific task in mind, you might have a look at https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/101 (and ask there).
Related https://github.com/adysonmaia/sb3-plus (I rediscovered it recently)
> Make HybridPPO inherit from [OnPolicyAlgorithm](https://github.com/DLR-RM/stable-baselines3/blob/b018e4bc949503b990c3012c0e36c9384de770e6/stable_baselines3/common/on_policy_algorithm.py#L21) This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the...
cla signed now