Antonin RAFFIN
Antonin RAFFIN
Hello, > I'll summarize it before I start working on the documentation itself. thanks =) your description is right: - the `result_plotter` is mostly interesting for its `ts2xy` and `window_func`,...
> What about sticky actions? you mean its influence on performance? I don't know, I think the main issue is that the changes were made without benchmark (in the paper,...
> TL;DR, sticky actions are the recommended way to prevent agents from abusing determinism, not a way to improve rewards. thanks for you answer =) My question was not about...
> Is the issue that people are using existing SB3 results in their papers, and might mistakenly attribute the charts that you have now yes, that's the issue. And not...
> Would it be possible to set up a system for people to contribute individual runs? yes =D, that's the whole point of the openrl benchmark initiative by @vwxyzjn (best...
Linking discussion with @JesseFarebro as it's relevant to that issue: https://github.com/DLR-RM/stable-baselines3/pull/572#issuecomment-993701078 (also relevant: https://github.com/DLR-RM/stable-baselines3/pull/734) EDIT: it seems that gymnasium documentation is outdated
Closing in favor of https://github.com/DLR-RM/stable-baselines3/pull/704
Alternative solution from https://github.com/martius-lab/pink-noise-rl ```python # Initialize agent model = SAC("MlpPolicy", env) # Set action noise model.actor.action_dist = PinkNoiseDist(action_dim, seq_len) # Train agent model.learn(total_timesteps=10_000) ```
It seems that @adysonmaia implemented PPO with dict action space support here: https://github.com/adysonmaia/sb3-plus/blob/main/sb3_plus/mimo_ppo/ppo.py#L24
> Any updates about the rainbow implementation? Contributions are welcomed ;) (if you do so, please read the contributing guide from SB3-Contrib, it explains how to test new algorithms) It...