Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

BetaDistribution policy for bounded continuous action spaces to avoid Gaussian clipping bias and improve training stability

Hello, sorry for the delay. > demonstrated that replacing Gaussian with a Beta distribution Could you open a draft PR and run some benchmark? (checking that we can reproduce some...

Implement Beyond the Rainbow (BTR) Algorithm

Hello, thanks for the proposal, but I would actually prefer to have rainbow first, see https://github.com/DLR-RM/stable-baselines3/issues/622 and related PR like https://github.com/DLR-RM/stable-baselines3/pull/1622 We need help there, especially for benchmarking and having...

Implement Beyond the Rainbow (BTR) Algorithm

> Does that sound like a reasonable path going forward? this sounds reasonable =) Make sure to read the [contributing guide](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/CONTRIBUTING.md#pull-request-pr-and-review) to know what is expected. You might also have...

Fix tests for mps support

Hello, thanks for having a look at that. Apart from some tests failing, does the algorithms work in normal conditions? (for instance `PPO("MlpPolicy", "Pendulum-v1", device="mps").learn(10_000)`) (In theory, if pytorch supports...

Fix tests for mps support

I think most issues are related to numpy v2, and should be fixed in https://github.com/DLR-RM/stable-baselines3/pull/2041 too.

[Feature Request] Implement GRPO

Hello, thanks for the proposal. > I acknowledge that GRPO was originally developed for LLM fine-tuning. My implementation elegantly uses seeds to enable sampling of multiple trajectories, Similar to what...

[Question] How to test recurrent + maskable + dependent multidiscrete actions?

Hello, Unfortunately, I don't have any specific task in mind, you might have a look at https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/101 (and ask there).

Hybrid PPO

Related https://github.com/adysonmaia/sb3-plus (I rediscovered it recently)

Hybrid PPO

> Make HybridPPO inherit from [OnPolicyAlgorithm](https://github.com/DLR-RM/stable-baselines3/blob/b018e4bc949503b990c3012c0e36c9384de770e6/stable_baselines3/common/on_policy_algorithm.py#L21) This is fine, I don't think there should be too many duplicated code because of that, no? (that's what we do already for the...

More flexible range with arch

cla signed now