stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Results 192 stable-baselines3 issues
Sort by recently updated
recently updated
newest added

### 🐛 Bug When `EpisodicLifeEnv` triggers a reset due to the end of lives, it takes a no-op action to "restart" the game. This no-op action may cause the actual...

bug

### Question When I use stable baselines3 for my custom environment, I have found even though the reward in training is pretty high, the reward in the evaluation is low....

question
custom gym env
more information needed

### 🚀 Feature An option to collect rollout for n_episodes instead of n_steps for on policy algorithms. ### Motivation Some environments, like games, have the most important reward at the...

enhancement

### 🚀 Feature Add logger.close() to StopTrainingOnMaxEpisodes class. ### Motivation While I was working around with this amazing tool had some problems training models with large timesteps so the StopTrainingOnMaxEpisodes...

enhancement

### 🚀 Feature This issue is to discuss the possibility of including tests in the type checking CI pipeline. ### Motivation Currently, tests are not being type checked, which means...

enhancement

### 🚀 Feature currently EvalCallback seems to only save one best_model. Add support for saving multiple best_models (say 5) ### Motivation In my own environment, from the eval curves in...

enhancement

### 🐛 Bug The return type of methods `.load()` and `.learn()` in `BaseAlgorithm` is annotated as `"BaseAlgorithm"`, which means that for any subclass that does not override the methods with...

enhancement

## Description ## Motivation and Context - [ ] I have raised an issue to propose this change ([required](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) for new features and bug fixes) ## Types of changes -...

### 🚀 Feature - In `HerReplayBuffer`, initialize `self.self._buffer` considering dtype of each inputs ### Motivation In the implementation of `HerReplayBuffer`, `self.self._buffer` is initialized with zeros of `np.float32`, which may lead...

duplicate
enhancement

### Question I am using stable-baselines3's implementation of HER with a custom environment, but I ran into problems in the reward computation step. The Gym environment is based on `GoalEnv`,...

documentation
question
custom gym env