stable-baselines3
stable-baselines3 copied to clipboard
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
### 🐛 Bug **check_env result** > Traceback (most recent call last): > File "D:\Thesis_\Test\PPonew.py", line 461, in > check_env(env) > File "C:\Users\Cr7th\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\env_checker.py", line 409, in check_env > assert isinstance( >...
This is found via https://github.com/pytorch-labs/torchfix/ `torch.load` without `weights_only` parameter is unsafe. Explicitly set `weights_only` to False only if you trust the data you load and full pickle functionality is needed,...
### 🐛 Bug When calling ```python from stable_baselines3.common.evaluation import evaluate_policy def custom_callback(locals, globals): pass evaluate_policy(callback=custom_callback) ``` with a vecenv, then the callback gets executed for each of the environments separately....
## Description Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading...
### ❓ Question Hi, I am looking into the use of ONNX with SB3. I have tested 2 models (A2C and PPO) on a custom environment using a MultiInputActorCriticPolicy. The...
### ❓ Question Reinforcement learning and the SB3 implementations apply the typical constant gamma for discounting future values when learning. This is fine for discrete time environments where for each...
### 🐛 Bug When using SubprocVecEnv from stable-baselines3, ``` env = make_vec_env(lambda: env_creator3(env_config), n_envs=n_envs, vec_env_cls=SubprocVecEnv) ``` the seeds are automatically set in a sequential manner starting from a base seed,...
## Description In `RolloutBuffer.compute_returns_and_advantage` a numpy array with dtype bool is used as a, operand for subtraction with a python scalar. This relies on some automatic casting rules which pytorch...
### 🚀 Feature Currently, SB3 algorithms allow you to define the number of gradient steps $= -1$, which will translate into the number of timesteps in the rollout, let's call...
### ❓ Question **q**: I found that by running dqn, the output of ep_len_mean&ep_rew_mean are the same. Why this happens? How can I solve this? By running the example code:...