stable-baselines3 issues

Scaling Environment

6

### 🐛 Bug **check_env result** > Traceback (most recent call last): > File "D:\Thesis_\Test\PPonew.py", line 461, in > check_env(env) > File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\env_checker.py", line 409, in check_env > assert isinstance( >...

Hamza-101

custom gym env

check the checklist

`torch.load` without `weights_only` parameter is unsafe

18

This is found via https://github.com/pytorch-labs/torchfix/ `torch.load` without `weights_only` parameter is unsafe. Explicitly set `weights_only` to False only if you trust the data you load and full pickle functionality is needed,...

kit1980

enhancement

[Bug]: evaluate_policy called multiple times vor vectorized environments

5

### 🐛 Bug When calling ```python from stable_baselines3.common.evaluation import evaluate_policy def custom_callback(locals, globals): pass evaluate_policy(callback=custom_callback) ``` with a vecenv, then the callback gets executed for each of the environments separately....

LukasFehring

documentation

help wanted

Fix memory leak in base_class.py

6

## Description Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading...

peteole

Exporting MultiInputActorCriticPolicy as ONNX

6

### ❓ Question Hi, I am looking into the use of ONNX with SB3. I have tested 2 models (A2C and PPO) on a custom environment using a MultiInputActorCriticPolicy. The...

MaximCamilleri

question

more information needed

[Question] How to pass a varying gamma to DQN or PPO during training?

6

### ❓ Question Reinforcement learning and the SB3 implementations apply the typical constant gamma for discounting future values when learning. This is fine for discrete time environments where for each...

rariss

question

SubprocVecEnv Sets Out-of-Range Seeds for My Environments (ScenarioNet Enviroment)

7

### 🐛 Bug When using SubprocVecEnv from stable-baselines3, ``` env = make_vec_env(lambda: env_creator3(env_config), n_envs=n_envs, vec_env_cls=SubprocVecEnv) ``` the seeds are automatically set in a sequential manner starting from a base seed,...

chrisgao99

custom gym env

Avoid torch type-error under torch.compile

## Description In `RolloutBuffer.compute_returns_and_advantage` a numpy array with dtype bool is used as a, operand for subtraction with a python scalar. This relies on some automatic casting rules which pytorch...

amjames

[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps

1

### 🚀 Feature Currently, SB3 algorithms allow you to define the number of gradient steps $= -1$, which will translate into the number of timesteps in the rollout, let's call...

janakact

enhancement

check the checklist

[Question] The error about DQN--ep_len_mean&ep_rew_mean output

1

### ❓ Question **q**: I found that by running dqn, the output of ep_len_mean&ep_rew_mean are the same. Why this happens? How can I solve this? By running the example code:...

AnnyOrange

question

openai gym

stable-baselines3
stable-baselines3 copied to clipboard

Metadata

Scaling Environment

`torch.load` without `weights_only` parameter is unsafe

[Bug]: evaluate_policy called multiple times vor vectorized environments

Fix memory leak in base_class.py

Exporting MultiInputActorCriticPolicy as ONNX

[Question] How to pass a varying gamma to DQN or PPO during training?

SubprocVecEnv Sets Out-of-Range Seeds for My Environments (ScenarioNet Enviroment)

Avoid torch type-error under torch.compile

[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps

[Question] The error about DQN--ep_len_mean&ep_rew_mean output

← Metadata

Owner

Metadata

stable-baselines3 stable-baselines3 copied to clipboard

Metadata

← Metadata

Owner

Metadata

stable-baselines3
stable-baselines3 copied to clipboard