ray
ray copied to clipboard
[RLlib] Add support for multi-agent off-policy algorithms in the new API stack.
Why are these changes needed?
Off-policy algorithms moved from old to the new stack and worked so far only in single-agent mode. We were missing a standard Learner API for the new stack which is now available: Any LearnerGroup
receives now List[EpisodeType]
for updates.
This PR adds the support for multi-agent setups in off-policy algorithms using the new MultiAgentEpisodeReplayBuffer
. This PR includes all necessary modifications for "independent"
sampling and includes an example for SAC to be added to the learning_tests
.
Related issue number
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [x] I've run
scripts/format.sh
to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/
under the corresponding.rst
file.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [x] Unit tests
- [x] Release tests
- [ ] This PR is not tested :(