Antonin RAFFIN
Antonin RAFFIN
## Description ## Motivation and Context - [ ] I have raised an issue to propose this change ([required](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) for new features and bug fixes) ## Types of changes -...
Recent approaches have proposed to enhance exploration using an intrinsic reward. Among the techniques: - [Intrinsic Curiosity Module](https://arxiv.org/abs/1705.05363): uses the loss of a forward model on features (trained with an...
SAC jax
## Description Missing: benchmark and doc Adapted from https://github.com/araffin/sbx Report (3 seeds on 3 MuJoCo envs): https://wandb.ai/openrlbenchmark/cleanrl/reports/SAC-jax---VmlldzoyODM4MjU0 ## Types of changes - [ ] Bug fix - [ ] New...
Reading https://github.com/vwxyzjn/cleanrl/blob/master/docs/rl-algorithms/sac.md and the code while implementing SAC with Jax (#300 ), several tweaks have been made compared to the original SAC implementation and I was wondering why and if...
### 🚀 Feature Allow to store git hash (à la W&B) and maybe do a `pip freeze` to known which version of what is used. Could be opt-in (log only...
This issue is to not forget. v4 of MuJoCo are much easier to install (using the new bindings), but v5 are also in preparation (and will break several things).
New env support: - [x] Isaac Lab (https://github.com/DLR-RM/rl-baselines3-zoo/pull/484) ~- [ ] Env Pool (https://github.com/DLR-RM/rl-baselines3-zoo/issues/241)~ no longer maintained - [x] Brax envs (https://github.com/DLR-RM/rl-baselines3-zoo/pull/484) - [ ] DM Control (see https://gist.github.com/araffin/534f7f1506364eb824c2f4d6a2dd81d1) Improvements:...
### 🚀 Feature Current PPO trained agent (on huggingface hub) is using outdated hyperparams (since https://github.com/DLR-RM/rl-baselines3-zoo/pull/335). A new agent need to be re-trained and pushed (and probably need to update...
### 🐛 Bug i'm not sure if it's due to specific version of Atari, but I remember having to add `terminal_on_life_loss: False` for PPO LSTM to prevent those hangs. I...
- upgraded stable-baselines - add seeding and road generation - add pretraining using behavior cloning - add data-augmentation TODO: tune pretraining