ray
ray copied to clipboard
[serve] Add `max_queued_requests` option for backpressure
Why are these changes needed?
This PR is stacked on: https://github.com/ray-project/ray/pull/43167
Adds a max_queued_requests
parameter to deployments. Once this limit is reached in the number of requests queued to replicas (not assigned), load shedding will occur.
TODO before merging:
- [ ] Add unit tests to the router.
- [ ] Add unit tests to the proxy.
- [x] Add integration test for HTTP, gRPC, and handle calls.
Related issue number
Closes https://github.com/ray-project/ray/issues/42950
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/
under the corresponding.rst
file.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
@zcin I'd still like to add more unit tests but this is ready for review