ray
ray copied to clipboard
[Data] [no_early_kickoff] Change autoscaling logic to use free slots and input queue size
Addresses the following TODO
# TODO: Replace the ready-to-total-ratio heuristic with a a work queue
# heuristic such that scale-up is only triggered if the current pool doesn't
# have enough worker slots to process the work queue.
Also fixes the issue where execution is blocked even though there are free slots available since should_scale_up
now actually looks at the number of free slots and the input queue size.
Previously, incremental_resource_usage
would only look at the autoscaling policy to determine if new resources would be created, regardless if there are free slots in the existing actor pool, potentially blocking execution.
Why are these changes needed?
Related issue number
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/
under the corresponding.rst
file.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
Do we have any existing actor pool autoscaling integration tests?
Ping on this one.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.
This is pretty important right?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.