stable-baselines3
stable-baselines3 copied to clipboard
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
### 🐛 Bug When `EpisodicLifeEnv` triggers a reset due to the end of lives, it takes a no-op action to "restart" the game. This no-op action may cause the actual...
### Question When I use stable baselines3 for my custom environment, I have found even though the reward in training is pretty high, the reward in the evaluation is low....
### 🚀 Feature An option to collect rollout for n_episodes instead of n_steps for on policy algorithms. ### Motivation Some environments, like games, have the most important reward at the...
### 🚀 Feature Add logger.close() to StopTrainingOnMaxEpisodes class. ### Motivation While I was working around with this amazing tool had some problems training models with large timesteps so the StopTrainingOnMaxEpisodes...
### 🚀 Feature This issue is to discuss the possibility of including tests in the type checking CI pipeline. ### Motivation Currently, tests are not being type checked, which means...
### 🚀 Feature currently EvalCallback seems to only save one best_model. Add support for saving multiple best_models (say 5) ### Motivation In my own environment, from the eval curves in...
### 🐛 Bug The return type of methods `.load()` and `.learn()` in `BaseAlgorithm` is annotated as `"BaseAlgorithm"`, which means that for any subclass that does not override the methods with...
## Description ## Motivation and Context - [ ] I have raised an issue to propose this change ([required](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) for new features and bug fixes) ## Types of changes -...
### 🚀 Feature - In `HerReplayBuffer`, initialize `self.self._buffer` considering dtype of each inputs ### Motivation In the implementation of `HerReplayBuffer`, `self.self._buffer` is initialized with zeros of `np.float32`, which may lead...
### Question I am using stable-baselines3's implementation of HER with a custom environment, but I ran into problems in the reward computation step. The Gym environment is based on `GoalEnv`,...