ray icon indicating copy to clipboard operation
ray copied to clipboard

[serve] Add `max_queued_requests` option for backpressure

Open edoakes opened this issue 1 year ago • 1 comments

Why are these changes needed?

This PR is stacked on: https://github.com/ray-project/ray/pull/43167

Adds a max_queued_requests parameter to deployments. Once this limit is reached in the number of requests queued to replicas (not assigned), load shedding will occur.

TODO before merging:

  • [ ] Add unit tests to the router.
  • [ ] Add unit tests to the proxy.
  • [x] Add integration test for HTTP, gRPC, and handle calls.

Related issue number

Closes https://github.com/ray-project/ray/issues/42950

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

edoakes avatar Feb 14 '24 16:02 edoakes

@zcin I'd still like to add more unit tests but this is ready for review

edoakes avatar Feb 14 '24 17:02 edoakes