stable-baselines3 issues

[Bug]: Possible inconsistencies with the PPO implementation

2

### 🐛 Bug I tested different implementations of the PPO algorithm and found some discrepancies among the implementations. I tested each implementation on 56 Atari environments, with five trials per...

rajdeepsh

question

[Feature Request] Temporal Convolutional network

1

### 🚀 Feature Hello guys, After watching this video : [https://www.youtube.com/watch?v=WoLlZLdoEQk](url) I had the idea to extend the NatureCNN to NatureCTN1D this way : ``` class Chomp1d(nn.Module): def __init__(self, chomp_size):...

tty666

enhancement

[Bug]: Higher memory usage on sequential training runs

6

### 🐛 Bug Hey, thanks a lot for your work! I am trying to debug an apparent memory leak/higher memory usage when running the training code multiple times, but I...

NickLucche

bug

[Question] How do I correctly manually reset the episode on a `rollout_end`?

2

### ❓ Question Hello, I am modifying an environment on selected training milestones, on the end of rollouts. After these modifications I want any episode cut short when the rollout...

npit

question

[Feature Request] When are you planning to upgrade to Gymnasium v1.0.0

1

### 🚀 Feature When are you planning to upgrade to Gymnasium v1.0.0 https://github.com/Farama-Foundation/Gymnasium/releases/tag/v1.0.0 ### Motivation _No response_ ### Pitch _No response_ ### Alternatives _No response_ ### Additional context _No response_...

drulye

duplicate

enhancement

[Question] TD3 algorithm， During training，why limit the next_actions

### ❓ Question TD3 algorithm， During training，why limit the next_actions？ If my action range is much larger than [-1,1], the data is truncated https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/td3/td3.py#L171 ### Checklist - [X] I have...

Danny551

question

custom gym env

RTFM

[Feature Request] request title jax API and gymnax

1

### 🚀 Feature Suggest updating the stable version of the jax API and the algorithm API for gym ### Motivation _No response_ ### Pitch _No response_ ### Alternatives _No response_...

quanmissq

enhancement

more information needed

[Question] Can stable-baselines3 be installed through pip without cuda dependencies? Is the CPU only docker image the only alternative?

1

### ❓ Question Question in the title: Can stable-baselines3 be installed through pip without cuda dependencies? Is the CPU only docker image the only alternative? ### Checklist - [X] I...

joemc94

question

[Question] Manually Controlling Actions During PPO Training

2

### ❓ Question Thank you very much for creating such an excellent tool. I am currently using the PPO algorithm in Stable-Baselines3 (SB3) for training in a custom environment. During...

wayne-weiwei

question

custom gym env

more information needed

check the checklist

[bug] Adaptive SAC: using logarithm of entropy coefficient to compute temperature objective instead of entropy coefficient

1

In the [paper](https://arxiv.org/abs/1812.05905#:~:text=In%20this%20paper,%20we%20describe%20Soft%20Actor-Critic%20(SAC),%20our%20recently), equation (18), the entropy coefficient is used directly, while in the sb3 implementation its logarithm is used ([here](https://github.com/DLR-RM/stable-baselines3/blob/512eea923afad6f6da4bb53d72b6ea4c6d856e59/stable_baselines3/sac/sac.py#L231)). This way, the temperature coefficient used in the critic...

Mattia-sony

documentation

duplicate

question

stable-baselines3
stable-baselines3 copied to clipboard

Metadata

[Bug]: Possible inconsistencies with the PPO implementation

[Feature Request] Temporal Convolutional network

[Bug]: Higher memory usage on sequential training runs

[Question] How do I correctly manually reset the episode on a `rollout_end`?

[Feature Request] When are you planning to upgrade to Gymnasium v1.0.0

[Question] TD3 algorithm， During training，why limit the next_actions

[Feature Request] request title jax API and gymnax

[Question] Can stable-baselines3 be installed through pip without cuda dependencies? Is the CPU only docker image the only alternative?

[Question] Manually Controlling Actions During PPO Training

[bug] Adaptive SAC: using logarithm of entropy coefficient to compute temperature objective instead of entropy coefficient

← Metadata

Owner

Metadata

stable-baselines3 stable-baselines3 copied to clipboard

Metadata

← Metadata

Owner

Metadata

stable-baselines3
stable-baselines3 copied to clipboard