ray icon indicating copy to clipboard operation
ray copied to clipboard

[Serve][Experiment] RPC support

Open sihanwang41 opened this issue 3 years ago • 2 comments

Why are these changes needed?

  1. Add rpc driver and update the controller to deploy the driver node
  2. Add schema
  3. Add test
  4. Add benchmark release/serve_tests/workloads/serve_grpc_benchmark.py, gRPC is 2.5x better throughput than HTTP
  5. Update controller logic to adopt no http proxy by using a env flag, there no harm to add it, but for performance consideration, it should be disabled. and http proxy actor is going to be deprecated in the future release.

Related issue number

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

sihanwang41 avatar Aug 30 '22 17:08 sihanwang41

Nice work so far! I left a few comments.

One high-level comment– the DriverDeployment is a special-case that adds a lot of complexity. Please correct me if I'm wrong, but it seems that it's introduced solely to use SPREAD scheduling on the driver deployment replicas.

Instead, can we introduce scheduling_strategy as a new (maybe undocumented/private) ray_actor_option? That way, we can control the replica placement without adding new codepaths or a special-case DriverDeployment.

Hi @shrekris-anyscale, Currently yes, DriverDeployment only has scheduling, but to use that scheduling strategy, the deployment_state also needs to take care of collecting all node_id to deploy and make sure each node has exactly one replica to deploy. So I don't think we can avoid the code path here.

sihanwang41 avatar Sep 20 '22 03:09 sihanwang41

Nice! The main thing I wanted to ask about is the new concept of "Driver deployment" introduced in this PR. Right now does it just mean "one replica per node", or do we want the concept to encompass other things as well?

Do we plan to turn HTTPProxy and the other existing DAGDrivers into "Driver deployments" in the future?

Yes, currently it means one replica per one node. And yes, we will migrate the DAGDrivers into "Driver deployments", and even more It is also possible to directly get rid of current http proxy actor.

sihanwang41 avatar Sep 20 '22 03:09 sihanwang41

@ray-project/ray-docs can you stamp this simple change to add a new monkey patched object in conf.py? image

simon-mo avatar Sep 29 '22 18:09 simon-mo