stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Results 192 stable-baselines3 issues
Sort by recently updated
recently updated
newest added

### 🚀 Feature Enable the user to save discounted return instead of accumulated reward in policy evaluation. ### Motivation Currently, EvalCallback and policy evaluation provide the user with accumulated rewards...

enhancement

I created a custom gym environment and successfully trained a model using PPO. I then saved it using the model.save() method and got back a zip file (as per the...

question

This PR updates SB3 to Gymnasium v1.0, read the [release-notes](https://github.com/Farama-Foundation/Gymnasium/releases/tag/v1.0.0) to see all the changes. ## Motivation and Context Gymnasium is the core API used in SB3, therefore would be...

### 🚀 Feature I propose to include in Stable Baselines 3 an option to use **Lattice** exploration, an action noise that some colleagues and I have presented in [this](https://arxiv.org/abs/2305.20065) NeurIPS...

enhancement

### 🐛 Bug SB3 models never deallocate the VRAM they use in a process even if they are deleted. This can lead to hardware issues including full system crashes that...

custom gym env

### 🐛 Bug I am implementing a simple custom environment for using PPO with MultiDiscrete observation space. It works if I use MultiDiscrete([ 5, 2, 2 ]), but when it...

documentation
help wanted
custom gym env

### 📚 Documentation I've noticed a potential mismatch between the implementation and the documentation of the `sum_independent_dims` function in `stable_baselines3.common.distributions`. According to its docstring, the function is designed to handle...

documentation
help wanted

### 🚀 Feature 1. Save rendered images in ```evaluate_policy()```. 2. Save rendered images in ```EvalCallback()```. ### Motivation In a headless server, the simplest way to examine the **behavior** of a...

enhancement

### 🐛 Bug When running model.learn on a SubprocVecEnv as follows: ` env = make_vec_env(ENV_ID, n_envs=cpus, vec_env_cls=SubprocVecEnv, vec_env_kwargs=dict(start_method="spawn")) model = SAC(**kwargs, env=env) model.learn(N_TIMESTEPS, callback=eval_callback) ` the program ends up in...

custom gym env
check the checklist

### 🐛 Bug In the method `stable_baselines3.common.on_policy_algorithm.OnPolicyAlgorithm.learn` the `iteration` value is not updated in the `locals` dictionary while using callbacks. ### To Reproduce ```python from stable_baselines3 import PPO def callback_function(v_locals,...

bug