stable-baselines3 issues

[feature-request] N-step returns for TD methods

5

Originally posted by @PartiallyTyped in https://github.com/hill-a/stable-baselines/issues/821 " N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have...

araffin

enhancement

[Bug] HER is not updating the done flag of HER transitions

2

### 🐛 [Bug] Hindsight experience replay (HER) is not updating the done flag of HER transitions When sampling HER transitions, the SB3 implementation calculates a new reward but not a...

JakobThumm

bug

[Feature Request] RAINBOW

7

**Important Note: We do not do technical support, nor consulting** and don't answer personal questions per email. Please post your question on the [RL Discord](https://discord.com/invite/xhfNqQv), [Reddit](https://www.reddit.com/r/reinforcementlearning/) or [Stack Overflow](https://stackoverflow.com/) in...

araffin

enhancement

help wanted

[Question] Multi Output Policy Support?

8

### Question Are multi output policies supported yet? I see that [dictionary observations](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#dict-observations) are supported per the docs, however I do not see anything out multi output policies... ### Additional...

H-Park

question

[Feature Request] Multi-Agent (MA) Support / Distributed algorithms (IMPALA/APEX)

15

Here is an issue to discuss about multi-agent and distributed agent support. My personal view on that is this should be done outside SB3 (even though it could use SB3...

araffin

enhancement

experimental

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards

5

In [atari_wrappers.py](https://github.com/DLR-RM/stable-baselines3/blob/b8c72a53489c6d80196a1dc168835a2f375b868d/stable_baselines3/common/atari_wrappers.py), allow `noop_max` to be `0` to make it possible to have a completely deterministic Atari environment. Also in the same file, having `ClipRewardEnv` bin, and not clip rewards,...

RaghuSpaceRajan

enhancement

[Bug] Logger does not support same names with different namespaces

### 🐛 Bug I found that bug while working on #255 . The logger will only log one value even when two namespaces are specified. ### To Reproduce ```python from...

araffin

bug

help wanted

[Enhancement] Use PyTorch's native TanhTransform?

2

**Describe the bug** As noted in https://github.com/DLR-RM/stable-baselines3/issues/36#issuecomment-634729158, the current `common.distributions.TanhBijector` used to squash actions in SAC (for instance) can be replaced by PyTorch's native transform. We can redefine `SquashedGaussianDistribution` by...

ManifoldFR

enhancement

[Optimization] Replay Buffers shouldn't use copy when using np.array

10

Related to #49 In https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L198-L208 https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L346-L349 We call `np.ndarray(x).copy()`. This is unnecessary because np.array has the argument "copy" which is True by default. https://numpy.org/doc/stable/reference/generated/numpy.array.html ```python import numpy as np x...

PartiallyUntyped

enhancement

Generalized State-Dependent Exploration (gSDE)

**This PR is NOT meant to be merged** This branch contains the code for reproducing the results in the paper "Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics" by...

araffin

stable-baselines3
stable-baselines3 copied to clipboard

Metadata

[feature-request] N-step returns for TD methods

[Bug] HER is not updating the done flag of HER transitions

[Feature Request] RAINBOW

[Question] Multi Output Policy Support?

[Feature Request] Multi-Agent (MA) Support / Distributed algorithms (IMPALA/APEX)

[Bug] Allow noop_max to be 0 and make ClipReward actually clip rewards

[Bug] Logger does not support same names with different namespaces

[Enhancement] Use PyTorch's native TanhTransform?

[Optimization] Replay Buffers shouldn't use copy when using np.array

Generalized State-Dependent Exploration (gSDE)

← Metadata

Owner

Metadata

stable-baselines3 stable-baselines3 copied to clipboard

Metadata

← Metadata

Owner

Metadata

stable-baselines3
stable-baselines3 copied to clipboard