stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Results 192 stable-baselines3 issues
Sort by recently updated
recently updated
newest added

### 🐛 Bug I tested different implementations of the PPO algorithm and found some discrepancies among the implementations. I tested each implementation on 56 Atari environments, with five trials per...

question

### 🚀 Feature Hello guys, After watching this video : [https://www.youtube.com/watch?v=WoLlZLdoEQk](url) I had the idea to extend the NatureCNN to NatureCTN1D this way : ``` class Chomp1d(nn.Module): def __init__(self, chomp_size):...

enhancement

### 🐛 Bug Hey, thanks a lot for your work! I am trying to debug an apparent memory leak/higher memory usage when running the training code multiple times, but I...

bug

### ❓ Question Hello, I am modifying an environment on selected training milestones, on the end of rollouts. After these modifications I want any episode cut short when the rollout...

question

### 🚀 Feature When are you planning to upgrade to Gymnasium v1.0.0 https://github.com/Farama-Foundation/Gymnasium/releases/tag/v1.0.0 ### Motivation _No response_ ### Pitch _No response_ ### Alternatives _No response_ ### Additional context _No response_...

duplicate
enhancement

### ❓ Question TD3 algorithm, During training,why limit the next_actions? If my action range is much larger than [-1,1], the data is truncated https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/td3/td3.py#L171 ### Checklist - [X] I have...

question
custom gym env
RTFM

### 🚀 Feature Suggest updating the stable version of the jax API and the algorithm API for gym ### Motivation _No response_ ### Pitch _No response_ ### Alternatives _No response_...

enhancement
more information needed

### ❓ Question Question in the title: Can stable-baselines3 be installed through pip without cuda dependencies? Is the CPU only docker image the only alternative? ### Checklist - [X] I...

question

### ❓ Question Thank you very much for creating such an excellent tool. I am currently using the PPO algorithm in Stable-Baselines3 (SB3) for training in a custom environment. During...

question
custom gym env
more information needed
check the checklist

In the [paper](https://arxiv.org/abs/1812.05905#:~:text=In%20this%20paper,%20we%20describe%20Soft%20Actor-Critic%20(SAC),%20our%20recently), equation (18), the entropy coefficient is used directly, while in the sb3 implementation its logarithm is used ([here](https://github.com/DLR-RM/stable-baselines3/blob/512eea923afad6f6da4bb53d72b6ea4c6d856e59/stable_baselines3/sac/sac.py#L231)). This way, the temperature coefficient used in the critic...

documentation
duplicate
question