ray
ray copied to clipboard
[RLlib] Move learning_starts logic into execution plans
Why are these changes needed?
-
learning_starts
should be renamed to something more descriptive:num_steps_sampled_before_learning_starts
-
Should be moved out of replay buffer config according to our philosophy: Algorithm should define what should happen when, but NOT how it should happen.
num_steps_sampled_before_learning_starts
answers the "when" and "what" questions and should thus be handled and configured on the top Algo level (not inside replay buffers).
Checks
- [x] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [x] This PR is not tested :(
can someone rename this pr. This thing goes way beyond renaming a parameter IIUC XD
@ArturNiederfahrenhorst This PR still needs to address all the TODOs