stable-baselines3-contrib icon indicating copy to clipboard operation
stable-baselines3-contrib copied to clipboard

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code

Results 70 stable-baselines3-contrib issues
Sort by recently updated
recently updated
newest added

### ❓ Question Hi and thanks for the great work! I am using RecurrentPPO in a current project and it strikes me that on [L294](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/25b43266e08ebe258061ac69688d94144799de75/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L294) the `self._last_lstm_states` added to the...

enhancement
question

### ❓ Question At `MaskablePPO` class, the change for getting the masks is to ask the environment to provide it by he function `get_action_mask`. I can see that the `get_action_mask`...

question

### 🚀 Feature I would like to implement CrossQ (https://openreview.net/pdf?id=PczQtTsTIX) in SB3, as also suggested by @araffin (https://github.com/araffin/sbx/pull/36#issuecomment-2027392759), ### Motivation CrossQ is one of the current state-of-the-art deep reinforcement learning...

enhancement

### 🐛 Bug When I try to train my agent with a bigger action space (usually around 1400) I get the following error. I tried the solutions found in [https://github.com/DLR-RM/stable-baselines3/issues/1596](url)...

custom gym env
check the checkboxes

### 📚 Documentation I am trying to understand how the RecurrentActorCriticPolicy works. Coming from an NLP background I am used to have tensors of the shape (batch_size, seq_len, feature_dim) as...

documentation

### 🐛 Bug Hello, I am currently using TQC (sb3 contrib version 2.3.0/ sb3 version: 2.3.2) with a custom environment on gymnasium (version 0.28.1) and Isaac Sim as a simulator....

bug
custom gym env
check the checkboxes

### ❓ Question I'm currently working on a project with my team, developing a MaskablePPO reinforcement learning model with MultiDiscrete action space. Since, our action space is really large, we...

question

This PR implements CrossQ (https://openreview.net/pdf?id=PczQtTsTIX), a novel off-policy deep RL algorithm that carefully uses batch normalisation and removes target networks to achieve state-of-the-art sample efficiency at a much lower computational...

### ❓ Question I have been using the https://github.com/HumanCompatibleAI/imitation/ library for imitation learning for sb3 PPO with great effect. However, my end goal is to do the same for RecurrentPPO....

question

### ❓ Question Hi, is it possible to train a Parallel PettingZoo environment using MaskablePPO? Just in case how "action_masks" should be implemented? Thanks :) ### Checklist - [X] I...

question