stable-baselines3-contrib issues

Results 70 stable-baselines3-contrib issues

Sort by recently updated

[Question] RecurrentPPO: Reset LSTM states early?

### ❓ Question Hi and thanks for the great work! I am using RecurrentPPO in a current project and it strikes me that on [L294](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/25b43266e08ebe258061ac69688d94144799de75/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L294) the `self._last_lstm_states` added to the...

phisad

enhancement

question

[Question] Why does MaskablePPO does not mask with some logic with last observation?

### ❓ Question At `MaskablePPO` class, the change for getting the masks is to ask the environment to provide it by he function `get_action_mask`. I can see that the `get_action_mask`...

EloyAnguiano

question

[Feature Request] Implement CrossQ

### 🚀 Feature I would like to implement CrossQ (https://openreview.net/pdf?id=PczQtTsTIX) in SB3, as also suggested by @araffin (https://github.com/araffin/sbx/pull/36#issuecomment-2027392759), ### Motivation CrossQ is one of the current state-of-the-art deep reinforcement learning...

danielpalen

enhancement

MaskablePPO Masking Doesn't Work with Big Action Space

### 🐛 Bug When I try to train my agent with a bigger action space (usually around 1400) I get the following error. I tried the solutions found in [https://github.com/DLR-RM/stable-baselines3/issues/1596](url)...

orkunkn

custom gym env

check the checkboxes

RecurrentActorCriticPolicy Behaviour Not Clear

### 📚 Documentation I am trying to understand how the RecurrentActorCriticPolicy works. Coming from an NLP background I am used to have tensors of the shape (batch_size, seq_len, feature_dim) as...

pasinit

documentation

TQC: ep_len_mean and ep_rew_mean does not match real values

### 🐛 Bug Hello, I am currently using TQC (sb3 contrib version 2.3.0/ sb3 version: 2.3.2) with a custom environment on gymnasium (version 0.28.1) and Isaac Sim as a simulator....

btabia

bug

custom gym env

check the checkboxes

Dependent Actions in MultiDiscrete Action Space

### ❓ Question I'm currently working on a project with my team, developing a MaskablePPO reinforcement learning model with MultiDiscrete action space. Since, our action space is really large, we...

bbarisbaturay

question

Implemented CrossQ

This PR implements CrossQ (https://openreview.net/pdf?id=PczQtTsTIX), a novel off-policy deep RL algorithm that carefully uses batch normalisation and removes target networks to achieve state-of-the-art sample efficiency at a much lower computational...

danielpalen

[Question] How to do pre-training on the RecurrentPPO MlpLstmPolicy

### ❓ Question I have been using the https://github.com/HumanCompatibleAI/imitation/ library for imitation learning for sb3 PPO with great effect. However, my end goal is to do the same for RecurrentPPO....

iwishiwasaneagle

question

[Question] Masked actions PPO in multiagent setting using PettigZoo

### ❓ Question Hi, is it possible to train a Parallel PettingZoo environment using MaskablePPO? Just in case how "action_masks" should be implemented? Thanks :) ### Checklist - [X] I...

MarcoPicione

question

stable-baselines3-contrib
stable-baselines3-contrib copied to clipboard

Metadata

[Question] RecurrentPPO: Reset LSTM states early?

[Question] Why does MaskablePPO does not mask with some logic with last observation?

[Feature Request] Implement CrossQ

MaskablePPO Masking Doesn't Work with Big Action Space

RecurrentActorCriticPolicy Behaviour Not Clear

TQC: ep_len_mean and ep_rew_mean does not match real values

Dependent Actions in MultiDiscrete Action Space

Implemented CrossQ

[Question] How to do pre-training on the RecurrentPPO MlpLstmPolicy

[Question] Masked actions PPO in multiagent setting using PettigZoo

← Metadata

Owner

Metadata

stable-baselines3-contrib stable-baselines3-contrib copied to clipboard

Metadata

← Metadata

Owner

Metadata

stable-baselines3-contrib
stable-baselines3-contrib copied to clipboard