simonsays1980 issues

Results 30 issues of


                                            simonsays1980

[RLlib] DQN Rainbow on new API stack (w/ EnvRunner):`training_step` implementation.

## Why are these changes needed? We are moving the standard algorithms to our new stack (i.e. `RLModule API` and EnvRunner API`). This PR is one part of moving DQN...

[RLlib] DQN Rainbow Learner with new API stack (and EnvRunner API).

## Why are these changes needed? We are moving DQN Rainbow over to the new stack using `RLModule API` and `EnvRunner API`. This PR introduces the learners for DQN Rainbow...

tests-ok

Functionality for a custom async evaluation function with the new EnvRunner API

Custom evaluation functions are a valuable feature in the old stack. With the new stack, more specifically the new `EnvRunner API` evaluation is always run asynchronously, but does not (yet)...

[Air|Tune] Allow Image logging in WandB

### Description It appars that the `WandBLoggerCallback` does not allow image logging at this time. There had been an issue before (#16837) that was fixed in a PR, but when...

enhancement

train

[RLlib] - `MultiAgentEnvRunner` misses multi-agent metrics

### What happened + What you expected to happen # What happened I ran the tuned example `[multi_agent_pendulum_ppo_envrunner.py](https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/ppo/multi_agent_pendulum_ppo_envrunner.py)` and did not see single agent metrics in Tensorboard. Debugging showed, they...

bug

triage

[RLlib] - Fix numerical overflow in gradient clipping for (many) large gradients

## Why are these changes needed? Large gradients and many of these could lead to numerical overflow when computing their l2-norm in `torch_utils.clip_gradients` (using the "global_norm"). This is counterproductive as...

rllib

rllib-newstack

[RLlib] Add support for multi-agent off-policy algorithms in the new API stack.

## Why are these changes needed? Off-policy algorithms moved from old to the new stack and worked so far only in single-agent mode. We were missing a standard Learner API...

[RLlib] - Add metrics to `EpisodeReplayBuffer` type classes.

### Description All `EpisodeReplayBuffer`s do not contain any metrics. Implement the new `MetricsLogger API` into the buffers. ### Use case Get a good view on the contents, stats, and sampling...

enhancement

triage

[RLlib] - Refactor `EpisodeReplayBuffer`s

### Description All the episode buffers share certain logic albeit in a slight modified way. These can be refactored. ### Use case Like this any inhertiance becomes easy and fast.

enhancement

rllib

rllib-newstack

[RLlib] Initial design for Ray-Data based offline RL Algos (on new API stack).

## Why are these changes needed? With the new stack we will deprecate the `RolloutWorker` which is in the old stack used for sampling from offline data. This PR is...

rllib

rllib-offline-rl

rllib-newstack