stable-baselines3
stable-baselines3 copied to clipboard
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
### 🚀 Feature While instantiating an RL algorithm, e.g., PPO, a static or pre-trained model can be passed through as a `features_extractor`. ### Motivation Currently, one can customize the feature...
### ❓ Question Let's assume that I have already trained model A to predict A(x) given observation x. I would now like to train a model B using PPO to...
### 🚀 Feature independently configurable learning rates for actor and critic in AC-style algorithms ### Motivation In literature the actor is often configured to learn slower, such that the critics...
### 🐛 Bug Using Procgen like in the [example from the docs website](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#sb3-and-procgenenv) results in: AssertionError: The algorithm only supports (, , , ) as action spaces but Discrete(15) was...
## Description Implementation of prioritized replay buffer for DQN. Closes #1242 ## Motivation and Context - [x] I have raised an issue to propose this change ([required](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) for new features...
### ❓ Question Hello, I have experience using Stable Baselines 3 as a module but am a beginner regarding its internal workings. I have a decent understanding of both multiagent...
## Description 1. Passed the appropriate elements from `self._seeds` and `self._options` to a "done" env that calls its `reset` function, in `DummyVecEnv.step_wait` 2. Added missing `options` argument in the `reset`...
### 📚 Documentation The documentation for all RL algorithms claims that the parameter policy has the type SACPolicy or ActorCriticPolicy or TD3Policy and so on: [https://stable-baselines3.readthedocs.io/en/master/modules/td3.html#parameters](url) Example: policy (TD3Policy) –...
### 🐛 Bug When I try to log metrics related to some hyperparameters on tensorboard, the values of metrics are not stored. ### To Reproduce ```python from stable_baselines3.common.logger import configure,...
### ❓ Question I have a simple use case that I have found no answer to in the documentation. (v. 2.2.1) I want to pass two chained callbacks as a...