ray [RLlib] `AutoregressiveActionsRLM` overhaul to fix flaky test and simplify.

[RLlib] `AutoregressiveActionsRLM` overhaul to fix flaky test and simplify.

Open simonsays1980 opened this issue 4 months ago • 0 comments

Why are these changes needed?

The autoregressive-actions example was flaky (see #47876) and could be simplified (as PPO only backpropagates through the log-probabilities. This PR suggests a simplified solution that converges very fast (11 iters).

Related issue number

Closes #47876

Checks

[x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
[x] I've run scripts/format.sh to lint the changes in this PR.
[x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
[ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [x] Unit tests
- [x] Release tests
- [ ] This PR is not tested :(

Oct 10 '24 13:10 simonsays1980

ray ray copied to clipboard

[RLlib] `AutoregressiveActionsRLM` overhaul to fix flaky test and simplify.

Why are these changes needed?

Related issue number

Checks

ray
ray copied to clipboard