stable-baselines3
stable-baselines3 copied to clipboard
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
### 🐛 Bug I tested different implementations of the PPO algorithm and found some discrepancies among the implementations. I tested each implementation on 56 Atari environments, with five trials per...
### 🚀 Feature Hello guys, After watching this video : [https://www.youtube.com/watch?v=WoLlZLdoEQk](url) I had the idea to extend the NatureCNN to NatureCTN1D this way : ``` class Chomp1d(nn.Module): def __init__(self, chomp_size):...
### 🐛 Bug Hey, thanks a lot for your work! I am trying to debug an apparent memory leak/higher memory usage when running the training code multiple times, but I...
### ❓ Question Hello, I am modifying an environment on selected training milestones, on the end of rollouts. After these modifications I want any episode cut short when the rollout...
### 🚀 Feature When are you planning to upgrade to Gymnasium v1.0.0 https://github.com/Farama-Foundation/Gymnasium/releases/tag/v1.0.0 ### Motivation _No response_ ### Pitch _No response_ ### Alternatives _No response_ ### Additional context _No response_...
### ❓ Question TD3 algorithm, During training,why limit the next_actions? If my action range is much larger than [-1,1], the data is truncated https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/td3/td3.py#L171 ### Checklist - [X] I have...
### 🚀 Feature Suggest updating the stable version of the jax API and the algorithm API for gym ### Motivation _No response_ ### Pitch _No response_ ### Alternatives _No response_...
### ❓ Question Question in the title: Can stable-baselines3 be installed through pip without cuda dependencies? Is the CPU only docker image the only alternative? ### Checklist - [X] I...
### ❓ Question Thank you very much for creating such an excellent tool. I am currently using the PPO algorithm in Stable-Baselines3 (SB3) for training in a custom environment. During...
In the [paper](https://arxiv.org/abs/1812.05905#:~:text=In%20this%20paper,%20we%20describe%20Soft%20Actor-Critic%20(SAC),%20our%20recently), equation (18), the entropy coefficient is used directly, while in the sb3 implementation its logarithm is used ([here](https://github.com/DLR-RM/stable-baselines3/blob/512eea923afad6f6da4bb53d72b6ea4c6d856e59/stable_baselines3/sac/sac.py#L231)). This way, the temperature coefficient used in the critic...