ray icon indicating copy to clipboard operation
ray copied to clipboard

[Data] [no_early_kickoff] Change autoscaling logic to use free slots and input queue size

Open amogkam opened this issue 1 year ago • 3 comments

Addresses the following TODO

# TODO: Replace the ready-to-total-ratio heuristic with a a work queue
# heuristic such that scale-up is only triggered if the current pool doesn't
# have enough worker slots to process the work queue.

Also fixes the issue where execution is blocked even though there are free slots available since should_scale_up now actually looks at the number of free slots and the input queue size.

Previously, incremental_resource_usage would only look at the autoscaling policy to determine if new resources would be created, regardless if there are free slots in the existing actor pool, potentially blocking execution.

Why are these changes needed?

Related issue number

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

amogkam avatar Apr 24 '23 19:04 amogkam

Do we have any existing actor pool autoscaling integration tests?

amogkam avatar Apr 24 '23 19:04 amogkam

Ping on this one.

ericl avatar May 01 '23 23:05 ericl

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

stale[bot] avatar Jun 15 '23 03:06 stale[bot]

This is pretty important right?

ericl avatar Jun 24 '23 04:06 ericl

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

stale[bot] avatar Aug 10 '23 03:08 stale[bot]