simonsays1980

Results 30 issues of simonsays1980

## Why are these changes needed? We are moving the standard algorithms to our new stack (i.e. `RLModule API` and EnvRunner API`). This PR is one part of moving DQN...

## Why are these changes needed? We are moving DQN Rainbow over to the new stack using `RLModule API` and `EnvRunner API`. This PR introduces the learners for DQN Rainbow...

tests-ok

Custom evaluation functions are a valuable feature in the old stack. With the new stack, more specifically the new `EnvRunner API` evaluation is always run asynchronously, but does not (yet)...

### Description It appars that the `WandBLoggerCallback` does not allow image logging at this time. There had been an issue before (#16837) that was fixed in a PR, but when...

enhancement
P2
train

### What happened + What you expected to happen # What happened I ran the tuned example `[multi_agent_pendulum_ppo_envrunner.py](https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/ppo/multi_agent_pendulum_ppo_envrunner.py)` and did not see single agent metrics in Tensorboard. Debugging showed, they...

bug
triage

## Why are these changes needed? Large gradients and many of these could lead to numerical overflow when computing their l2-norm in `torch_utils.clip_gradients` (using the "global_norm"). This is counterproductive as...

rllib
rllib-newstack

## Why are these changes needed? Off-policy algorithms moved from old to the new stack and worked so far only in single-agent mode. We were missing a standard Learner API...

go

### Description All `EpisodeReplayBuffer`s do not contain any metrics. Implement the new `MetricsLogger API` into the buffers. ### Use case Get a good view on the contents, stats, and sampling...

enhancement
triage

### Description All the episode buffers share certain logic albeit in a slight modified way. These can be refactored. ### Use case Like this any inhertiance becomes easy and fast.

enhancement
P2
rllib
rllib-newstack

## Why are these changes needed? With the new stack we will deprecate the `RolloutWorker` which is in the old stack used for sampling from offline data. This PR is...

rllib
rllib-offline-rl
rllib-newstack