stable-baselines3 issues

[Question] Pong environment with A2C not learning with example code

1

### ❓ Question I copied the code from the Examples section in the documentation, which also uses a PongNoFrameskip-v4 environment with 4 stacked frames. The episodic mean reward starts out...

Tanis1304

question

[Question] A error while using SAC and DDPG

1

### ❓ Question 我使用PPO和A2C是可以运行的，换了DDPG和SAC就不行了，在第一个回合结束后会报下面的错误Why is my code generating this error? ```python Traceback (most recent call last): File "D:\ps\anaconda\envs\metro-env1\lib\code.py", line 90, in runcode exec(code, self.locals) File "", line 1, in File...

minxuef

custom gym env

check the checklist

proposed fix for RunningMeanStd overflow

Connected to Issue https://github.com/DLR-RM/stable-baselines3/issues/1953 ## Description RunningMeanStd is made robust to overflows with two modifications: - the product that can produce overflows when `count' becomes too large is split into...

spiglerg

[Bug]: RunningMeanStd overflowing

1

### 🐛 Bug RunningMeanStd is not overflow safe, and overflows when running large-scale training (e.g., on a cluster). ### To Reproduce I'm submitting a pull request with a proposal to...

spiglerg

bug

check the checklist

StopTrainingOnMaxEpisodes Assertion Error for 'dones' in locals

### ❓ Question Hi, I'm trying to run PPO but whenever I try to use StopTrainingOnMaxEpisodes, it gives me the assertion error: AssertionError: `dones` variable is not defined, please check...

KevinHan1209

question

more information needed

check the checklist

[Question] DQN optimizer parameters

1

### ❓ Question I have a question about the optimizer initialization process in `DQNPolicy`. While working on a custom DQN model, I noticed that when creating the optimizer, we pass...

rtkbv

enhancement

good first issue

help wanted

question

[Question] Running Multi-threaded PPO training independently with no interference

3

### ❓ Question I am trying to parallelise execution of PPO training on MuJoCo environments, where each multiprocessing thread uses a slightly modified xml file to train PPO with. For...

n-kish

question

[Bug]: No metrics logged when using wandb integrations

2

### 🐛 Bug When I use wandb integrations, it didn't report a warning or error, but in the wandb website there is no metrics logged. I tried with other framework,...

XiaobenLi00

bug

custom gym env

check the checklist

[Feature Request] add_scalars to wirte func in TensorBoardOutputFormat in logger

2

### 🚀 Feature Add support for multi-variable logging in the logger module using add_scalars. Enable the logger to record and visualize multiple related scalar values simultaneously using a single record...

shimonShouei

enhancement

Add precommit config yaml and fix typos automatically

5

## Description Add a pre-commit-config yaml for the pre-commit message, fix its typos accordingly There are two open points ## Motivation and Context Automatically check codespell in pre-commit hooks, i.e....

cschindlbeck

PR template not filled

stable-baselines3
stable-baselines3 copied to clipboard

Metadata

[Question] Pong environment with A2C not learning with example code

[Question] A error while using SAC and DDPG

proposed fix for RunningMeanStd overflow

[Bug]: RunningMeanStd overflowing

StopTrainingOnMaxEpisodes Assertion Error for 'dones' in locals

[Question] DQN optimizer parameters

[Question] Running Multi-threaded PPO training independently with no interference

[Bug]: No metrics logged when using wandb integrations

[Feature Request] add_scalars to wirte func in TensorBoardOutputFormat in logger

Add precommit config yaml and fix typos automatically

← Metadata

Owner

Metadata

stable-baselines3 stable-baselines3 copied to clipboard

Metadata

← Metadata

Owner

Metadata

stable-baselines3
stable-baselines3 copied to clipboard