stable-baselines3-contrib
stable-baselines3-contrib copied to clipboard
Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
### ❓ Question Hi and thanks for the great work! I am using RecurrentPPO in a current project and it strikes me that on [L294](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/25b43266e08ebe258061ac69688d94144799de75/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L294) the `self._last_lstm_states` added to the...
### ❓ Question At `MaskablePPO` class, the change for getting the masks is to ask the environment to provide it by he function `get_action_mask`. I can see that the `get_action_mask`...
### 🚀 Feature I would like to implement CrossQ (https://openreview.net/pdf?id=PczQtTsTIX) in SB3, as also suggested by @araffin (https://github.com/araffin/sbx/pull/36#issuecomment-2027392759), ### Motivation CrossQ is one of the current state-of-the-art deep reinforcement learning...
### 🐛 Bug When I try to train my agent with a bigger action space (usually around 1400) I get the following error. I tried the solutions found in [https://github.com/DLR-RM/stable-baselines3/issues/1596](url)...
### 📚 Documentation I am trying to understand how the RecurrentActorCriticPolicy works. Coming from an NLP background I am used to have tensors of the shape (batch_size, seq_len, feature_dim) as...
### 🐛 Bug Hello, I am currently using TQC (sb3 contrib version 2.3.0/ sb3 version: 2.3.2) with a custom environment on gymnasium (version 0.28.1) and Isaac Sim as a simulator....
### ❓ Question I'm currently working on a project with my team, developing a MaskablePPO reinforcement learning model with MultiDiscrete action space. Since, our action space is really large, we...
This PR implements CrossQ (https://openreview.net/pdf?id=PczQtTsTIX), a novel off-policy deep RL algorithm that carefully uses batch normalisation and removes target networks to achieve state-of-the-art sample efficiency at a much lower computational...
### ❓ Question I have been using the https://github.com/HumanCompatibleAI/imitation/ library for imitation learning for sb3 PPO with great effect. However, my end goal is to do the same for RecurrentPPO....
### ❓ Question Hi, is it possible to train a Parallel PettingZoo environment using MaskablePPO? Just in case how "action_masks" should be implemented? Thanks :) ### Checklist - [X] I...