ray icon indicating copy to clipboard operation
ray copied to clipboard

[RLlib] Add support for multi-agent off-policy algorithms in the new API stack.

Open simonsays1980 opened this issue 9 months ago • 0 comments

Why are these changes needed?

Off-policy algorithms moved from old to the new stack and worked so far only in single-agent mode. We were missing a standard Learner API for the new stack which is now available: Any LearnerGroup receives now List[EpisodeType] for updates.

This PR adds the support for multi-agent setups in off-policy algorithms using the new MultiAgentEpisodeReplayBuffer. This PR includes all necessary modifications for "independent" sampling and includes an example for SAC to be added to the learning_tests.

Related issue number

Checks

  • [x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [x] I've run scripts/format.sh to lint the changes in this PR.
  • [x] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [x] Unit tests
    • [x] Release tests
    • [ ] This PR is not tested :(

simonsays1980 avatar May 07 '24 15:05 simonsays1980