simonsays1980
simonsays1980
## Why are these changes needed? We are moving the standard algorithms to our new stack (i.e. `RLModule API` and EnvRunner API`). This PR is one part of moving DQN...
## Why are these changes needed? We are moving DQN Rainbow over to the new stack using `RLModule API` and `EnvRunner API`. This PR introduces the learners for DQN Rainbow...
Custom evaluation functions are a valuable feature in the old stack. With the new stack, more specifically the new `EnvRunner API` evaluation is always run asynchronously, but does not (yet)...
### Description It appars that the `WandBLoggerCallback` does not allow image logging at this time. There had been an issue before (#16837) that was fixed in a PR, but when...
### What happened + What you expected to happen # What happened I ran the tuned example `[multi_agent_pendulum_ppo_envrunner.py](https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/ppo/multi_agent_pendulum_ppo_envrunner.py)` and did not see single agent metrics in Tensorboard. Debugging showed, they...
## Why are these changes needed? Large gradients and many of these could lead to numerical overflow when computing their l2-norm in `torch_utils.clip_gradients` (using the "global_norm"). This is counterproductive as...
## Why are these changes needed? Off-policy algorithms moved from old to the new stack and worked so far only in single-agent mode. We were missing a standard Learner API...
### Description All `EpisodeReplayBuffer`s do not contain any metrics. Implement the new `MetricsLogger API` into the buffers. ### Use case Get a good view on the contents, stats, and sampling...
### Description All the episode buffers share certain logic albeit in a slight modified way. These can be refactored. ### Use case Like this any inhertiance becomes easy and fast.
## Why are these changes needed? With the new stack we will deprecate the `RolloutWorker` which is in the old stack used for sampling from offline data. This PR is...