stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Results 192 stable-baselines3 issues
Sort by recently updated
recently updated
newest added

Originally posted by @PartiallyTyped in https://github.com/hill-a/stable-baselines/issues/821 " N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have...

enhancement

### 🐛 [Bug] Hindsight experience replay (HER) is not updating the done flag of HER transitions When sampling HER transitions, the SB3 implementation calculates a new reward but not a...

bug

**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in...

enhancement
help wanted

### Question Are multi output policies supported yet? I see that [dictionary observations](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#dict-observations) are supported per the docs, however I do not see anything out multi output policies... ### Additional...

question

Here is an issue to discuss about multi-agent and distributed agent support. My personal view on that is this should be done outside SB3 (even though it could use SB3...

enhancement
experimental

In [atari_wrappers.py](https://github.com/DLR-RM/stable-baselines3/blob/b8c72a53489c6d80196a1dc168835a2f375b868d/stable_baselines3/common/atari_wrappers.py), allow `noop_max` to be `0` to make it possible to have a completely deterministic Atari environment. Also in the same file, having `ClipRewardEnv` bin, and not clip rewards,...

enhancement

### 🐛 Bug I found that bug while working on #255 . The logger will only log one value even when two namespaces are specified. ### To Reproduce ```python from...

bug
help wanted

**Describe the bug** As noted in https://github.com/DLR-RM/stable-baselines3/issues/36#issuecomment-634729158, the current `common.distributions.TanhBijector` used to squash actions in SAC (for instance) can be replaced by PyTorch's native transform. We can redefine `SquashedGaussianDistribution` by...

enhancement

Related to #49 In https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L198-L208 https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L346-L349 We call `np.ndarray(x).copy()`. This is unnecessary because np.array has the argument "copy" which is True by default. https://numpy.org/doc/stable/reference/generated/numpy.array.html ```python import numpy as np x...

enhancement

**This PR is NOT meant to be merged** This branch contains the code for reproducing the results in the paper "Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics" by...