ray
ray copied to clipboard
[RFC][Serve] Multi Applications in serve 2.x API
Description
Since Ray 2.0, Ray Serve has recommended using the serve.run API over the previous .deploy()
API. However, this introduces a limitation that the user can only run a single application per cluster using this API. We have heard from users who need to manage multiple applications on a single cluster and want to extend the functionality to support it. This includes separate individual models, deployment graphs, and/or FastAPI apps.
Proposal
High level idea
- Introduce the concept of an “application” which can represent one or more deployments (bound into a graph). The driver of each application can define a FastAPI app for HTTP handling.
- Extend the existing Ray Serve API to deploy and delete multiple individual applications.
API changes
- Add
name
androute_prefix
arguments to the serve.run() API.name
is a unique identifier of each application. By default it will be set to the class name of the top-level driver deployment.route_prefix
is the HTTP route under which requests will be sent to the application. By default it will be set to the root (“/”). - Introduce serve.delete(name) -> None API.
- Introduce serve.list_applications() -> List[str] API.
Example
# Single model
@serve.deployment
class Model1:
def __call__(self):
return "hello from Model1"
# Model graph
@serve.deployment
class ModelGraph1:
def __init(self, model2_handle):
self.handle = model2_handle
def __call__(self):
return self.handle.remote()
@serve.deployment
class ModelGraph2:
def __call__(self):
return "hello from model"
# FastAPI application
app = FastAPI()
@serve.deployment
@serve.ingress(app)
class MyFastAPI:
@app.get("/my_route")
def hello(self):
return "hello"
# Deploy first application
serve.run(Model1.bind(), name=”my_app1”, route_prefix="app1")
requests.get("http://localhost:8000/app1")
# Deploy second application
serve.run(ModelGraph1.bind(ModelGraph2.bind), name=”my_app2”, route_prefix="app2")
requests.get("http://localhost:8000/app2")
# Deploy Fast Application
serve.run(MyFastAPI.bind(), name=”my_app3”, route_prefix="app3")
requests.get("http://localhost:8000/app3/my_route")
# Delete First Application
serve.delete("my_app1")
#List all current Applications
serve.list_applications() # Returns information about app2 & app3.
This change would be great as it kind of "merges" the API V1 and V2 providing great flexibility.
Can these classes be defined programmatically like in the API V1? In V1 i would write
Model1.options(param_1).deploy(route1)
Model1.options(param_2).deploy(route2)
And I would have 2 variants of the same model deployed on two routes, which is great if for example your ray serve is downstream to other processes that train models and save those binaries on S3, then you can programmatically deploy these models giving the S3 url as an option.
Using this new API, I am imagining something along the lines of:
serve.run(Model1.options(url1).bind(), name=”my_app1”, route_prefix="app1")
serve.run(Model1.options(url2).bind(), name=”my_app1”, route_prefix="app1")
Or if this is not possible, maybe creating a factory that creates the serve deployment based on a parameter?
def ModelFactory(s3_url):
@serve.deployment
class Model:
def __call__(self):
return f"Hello from model pointing to {s3_url}"
model1 = ModelFactory(url1)
model2 = ModelFactory(url2)
serve.run(model1.bind(), name=”my_app1”, route_prefix="app1")
serve.run(model2.bind(), name=”my_app1”, route_prefix="app1")
Having something like that would allow for incremental/iterative programmatic deployments like in the API v1 which would be awesome.
Hi @andreapiso , for this use case, how would you imagine sending requests to these two models? If two models are trained separately, you can directly use different route_prefix (e.g. app1, and app2) to differentiate them instead of using s3 URL. In this case, you can set route_prefix in the HTTP request to differentiate them. I am thinking to have
@serve.deployment()
class MyModel:
def __init__(self, s3_url):
self.model = load_model(s3_url)
def __call__(self, input):
return self.model(input)
serve.run(MyModel.bind(url1), app_name = "my_app1", route_prefix = "app1")
serve.run(MyModel.bind(url2), app_name = "my_app2", route_prefix = "app2")
requests.get("http://localhost:8000/app1")
requests.get("http://localhost:8000/app2")
This change would be great as it kind of "merges" the API V1 and V2 providing great flexibility.
Can these classes be defined programmatically like in the API V1? In V1 i would write
Model1.options(param_1).deploy(route1) Model1.options(param_2).deploy(route2)
And I would have 2 variants of the same model deployed on two routes, which is great if for example your ray serve is downstream to other processes that train models and save those binaries on S3, then you can programmatically deploy these models giving the S3 url as an option.
Using this new API, I am imagining something along the lines of:
serve.run(Model1.options(url1).bind(), name=”my_app1”, route_prefix="app1") serve.run(Model1.options(url2).bind(), name=”my_app1”, route_prefix="app1")
Or if this is not possible, maybe creating a factory that creates the serve deployment based on a parameter?
def ModelFactory(s3_url): @serve.deployment class Model: def __call__(self): return f"Hello from model pointing to {s3_url}" model1 = ModelFactory(url1) model2 = ModelFactory(url2) serve.run(model1.bind(), name=”my_app1”, route_prefix="app1") serve.run(model2.bind(), name=”my_app1”, route_prefix="app1")
Having something like that would allow for incremental/iterative programmatic deployments like in the API v1 which would be awesome.
Hi @andreapiso , for this use case, how would you imagine sending requests to these two models? If two models are trained separately, you can directly use different route_prefix (e.g. app1, and app2) to differentiate them instead of using s3 URL. In this case, you can set route_prefix in the HTTP request to differentiate them. I am thinking to have
@serve.deployment() class MyModel: def __init__(self, s3_url): self.model = load_model(s3_url) def __call__(self, input): return self.model(input) serve.run(MyModel.bind(url1), app_name = "my_app1", route_prefix = "app1") serve.run(MyModel.bind(url2), app_name = "my_app2", route_prefix = "app2") requests.get("http://localhost:8000/app1") requests.get("http://localhost:8000/app2")
This change would be great as it kind of "merges" the API V1 and V2 providing great flexibility. Can these classes be defined programmatically like in the API V1? In V1 i would write
Model1.options(param_1).deploy(route1) Model1.options(param_2).deploy(route2)
And I would have 2 variants of the same model deployed on two routes, which is great if for example your ray serve is downstream to other processes that train models and save those binaries on S3, then you can programmatically deploy these models giving the S3 url as an option. Using this new API, I am imagining something along the lines of:
serve.run(Model1.options(url1).bind(), name=”my_app1”, route_prefix="app1") serve.run(Model1.options(url2).bind(), name=”my_app1”, route_prefix="app1")
Or if this is not possible, maybe creating a factory that creates the serve deployment based on a parameter?
def ModelFactory(s3_url): @serve.deployment class Model: def __call__(self): return f"Hello from model pointing to {s3_url}" model1 = ModelFactory(url1) model2 = ModelFactory(url2) serve.run(model1.bind(), name=”my_app1”, route_prefix="app1") serve.run(model2.bind(), name=”my_app1”, route_prefix="app1")
Having something like that would allow for incremental/iterative programmatic deployments like in the API v1 which would be awesome.
Hi @sihanwang41, yes, inference would happen directly. What the S3 link is useful for is for programmatic deployment and exposing these models in the first place. Imagine i have a process upstream that trains a model, saves the binary in S3 and calls a Ray Serve "standard serving class" asking "please create a new endpoint to deploy this model i just trained, this is the S3 link"
This is a great addition.
I would also add that this type of "get/update" API would be really helpful for my use case, which is trying to run a deployment and send requests via the python API programmatically.
Another thing that is useful in development is the ability to "delete" a deployment (i.e., change the port that it is bound on).
I think it is important to note, that this change (lack of support for running multiple applications in the v2.x API) also permanently and irreparably breaks integrations like the Haystack-Ray integration and if this functionality is not restored in v2.x then likely will cause the loss of Haystack users who are using or considering using Ray Serve as a backend.
In Haystack the NLP pipeline runs locally and deploys its components individually (as individual deployments) onto Ray Serve and then keeps the ServeHandle
for each. At runtime it calls each of those ServeHandles
as the data progresses through the Haystack NLP pipeline.
This happens this way, so we can have Haystack pipeline components which are NOT deployed to Ray Serve (for various reasons) while other components of the same pipeline are deployed to Ray Serve.
This mix-and-match approach allows great flexibility.
Also this same approach allows for the reuse of the same component at runtime between multiple separate Haystack apps on the component level.
So if there are two Haystack apps with two different pipelines but those pipelines have overlapping components, then those components won't need to be deployed onto Ray Serve twice and wasting expensive resources (like GPUs), but can be deployed only once and reused between the two Haystack apps.
For example Haystack Pipeline A deployed a "Reader" component as part of its NLP pipeline to Ray Serve, then another Haystack Pipeline B can use the same "Reader" component from Ray Serve, without needed to deploy its own copy. This is achieved by the ServeHandle
returning the same handle as long as the deployment name and version is the same. This is important for resource re-use. This is why we also mourn the loss of the version
parameter in v2.x, but we can potentially work around it with the name
, just gets less clear that way.
@shrekris-anyscale has told me that this should be possible with the new design, by using graph with individual deployments - as long as the graph abstraction is lightweight enough. So Haystack would deploy each component of its Haystack NLP Pipeline as a separate graph onto Ray Serve and keep the handle for each of those deployed graphs, something like:
# Haystack NLP Pipeline - component #1:
@serve.deployment
def f1():
...
graph1 = f1.bind()
handle1 = serve.run(graph1)
haystack.store(handle1)
# Haystack NLP Pipeline - component #2:
@serve.deployment
def f2():
...
graph2 = f2.bind()
handle2 = serve.run(graph2)
haystack.store(handle2)
...
Which is the same pattern Haystack follows today with Ray v1.x API, except instead of bind()
and serve.run()
it uses the deploy()
and get_handle()
and then at runtime ray.get()
with remote()
. (see here and then here)
This is really great!
I am currently using multiple applications in a single cluster with Ray 1.x. These proposed changes make possible to have this on Ray 2.x, probably with minor changes on the application side. Also, I could use single models or model graphs.
Hi, is there any update on this? Also, how can I currently support multiple apps? I have 1 cluster with multiple apps using .deploy(). I am trying to use .bind() but it's not possible due to the limitation. What can I do currently to change my code to run .bind()?
If you want to support multiple apps, you will need to use deploy
until the changes discussed in the RFC make it in.
@andreapiso Is there any way around it? for example, creating a separate cluster for each app. or putting models of all the apps in a single model. Based on the route decisions, it will decide which model part to invoke.
Here's an approach for deploying multiple deployments that are defined with FastAPI semantics with a single head/ingress deployment that routes to the child deployments. It's probably not the solution to everyone's use case that is following this issue, but it was the solution to mine.
This was adapted from working code; there are lots of missing import statements and such. Hopefully you can get the gist of how this works.
# test_resource.py
app_a = FastAPI() # each deployment must have its own FastAPI app
@serve.deployment(ray_actor_options={'num_cpus': 1})
@serve.ingress(app_a)
class TestResourceUpdateDeployment:
@app_a.patch('/test_resource/{test_resource_id}', response_model=TestResourceResponse)
def update_test_resource(
self,
test_resource_data: TestResourceData,
test_resource_id: str,
) -> TestResource:
test_resource = TestResource.find(test_resource_id)
test_resource.update(test_resource_data).save()
return test_resource
test_resource_update_deployment = TestResourceUpdateDeployment.bind() # type: ignore
# ingress.py
from fastapi import FastAPI
from fastapi.routing import APIRoute
from mantium.logging import get_logger
from ray import serve
from ray.dag.class_node import ClassNode
from starlette.requests import Request
from starlette.responses import JSONResponse
from starlette.routing import Match
from test_resource import app_a, test_resource_update_deployment
@serve.deployment
class IngressDeployment:
"""
An ingress deployment that routes requests to sub deployments using FastAPI instances.
Each sub deployment must be passed to the initializer in a tuple with the FastAPI instance
used to define its routes.
"""
def __init__(self, *deployment_tuples: tuple[FastAPI, ClassNode]):
self.routers_handles = []
for app, handle in deployment_tuples:
for route in app.routes:
if isinstance(route, APIRoute): # user defined routes are APIRoutes
self.routers_handles.append((route, handle))
async def __call__(self, request: Request) -> Any:
"""Find an APIRoute that routes the request. Send the request the accompanying handler."""
for api_route, handle in self.routers_handles:
(match, _,) = api_route.matches(request.scope)
# Example of route matching:
# https://github.com/tiangolo/fastapi/blob/0.85.2/fastapi/routing.py#L309-L313
if match == Match.FULL:
ref = await handle.remote(request)
return await ref
return JSONResponse(content={'detail': 'No matching Ray Serve deployment'}, status_code=404)
ingress_deployment = IngressDeployment.bind( # type: ignore
(app_a, test_resource_deployment),
)
You can run serve build ingress:ingress_deployment
to get a yaml config file. The yaml will have two independent deployments - ingress and test_resource_deployment. Resources for each can be configured independently.
You can use the yaml file for a serve deploy
.
Or, you can do serve run ingress:ingress_deployment
.
Hi serve.run
for multi-application will be available on 2.3, it is available on master already! We are working on the CLI & REST API to support further advanced deployment for multi applications in 2.4 release. I will keep the channel updated when the 2.3 is available. (tentative: mid of Feb)
Thank you! Sihan
@sihanwang41 , Is the 2.3 version going to work with multiple fast-API apps?
@sihanwang41 will this come to gRPC also?
Thanks for the 2.3 release with Python API. When can we expect the 2.4 release with CLI and REST API for multi applications? Also in the docs, in the production guide, it is mentioned to use serve.deploy rather than serve.run. Which should be preferred?
Thanks for the 2.3 release with Python API. When can we expect the 2.4 release with CLI and REST API for multi applications? Also in the docs, in the production guide, it is mentioned to use serve.deploy rather than serve.run. Which should be preferred?
Hi, yes, CLI and REST API should be supported in 2.4. And you can use serve deploy
for production. serve deploy is calling REST API to deploy your application.
Thanks. As of the 2.3 release, serve.deploy using config files requires an import path which is the path to top-level serve deployment meaning only one application is supported. Just wanted to confirm, with the updates to CLI and REST API in 2.4, would the config files have the option to support multi applications as well?
Yes, with 2.4 you should be able to deploy multi applications by using CLI and REST API.
@sihanwang41 Still seeing this behavior in Ray 2.3.0 when using serve.run for multiple deployments. Can you confirm you this functionality working?
@sihanwang41 Still seeing this behavior in Ray 2.3.0 when using serve.run for multiple deployments. Can you confirm you this functionality working?
Hi @peterghaddad, can you share your script?
@sihanwang41 I am using the example on the homepage of the Ray serve docs. Tasking two deployments and running serve.run twice
Also in these docs: https://docs.ray.io/en/releases-2.4.0/serve/model_composition.html#serve-model-composition-deployment-graph
serve.run(node): This Python call can be added to your Python script to run a particular node. This call starts a Ray cluster (if one isn’t already running), deploys the node to it, and then returns. You can call this function multiple times in the same script on different DeploymentNodes. Each time, it tears down any deployments it previously deployed and deploy the passed-in node’s deployment. After the script exits, the cluster and any nodes deployed by serve.run are torn down.
Is serving multiple models with serve.run possible?
import requests
from starlette.requests import Request
from typing import Dict
from ray import serve
# 1: Define a Ray Serve deployment.
@serve.deployment(route_prefix="/")
class MyModelDeployment:
def __init__(self, msg: str):
# Initialize model state: could be very large neural net weights.
self._msg = msg
def __call__(self, request: Request) -> Dict:
return {"result": self._msg}
@serve.deployment(route_prefix="/")
class Test:
def __init__(self, msg: str):
# Initialize model state: could be very large neural net weights.
self._msg = msg
def __call__(self, request: Request) -> Dict:
return {"result": self._msg}
# 2: Deploy the model.
serve.run(MyModelDeployment.bind(msg="Hello world!"))
serve.run(Test.bind(msg="Hello world!"))
It seems like the above will randomly choose to delete one, and redeploy the other.
Using the following works.
serve.run(Test.options().bind(msg="test"), name="my_app1", route_prefix="/app1")
serve.run(MyModelDeployment.options().bind(msg="test"), name="my_app2", route_prefix="/app2")
Is anyone else able to deploy multiple independent models using serve.run
? Thanks!
Hi @peterghaddad , yeah, if you don't want your old app to be killed, you need to set "name" attribute.
For someone who comes here like me, this doc may help: https://docs.ray.io/en/releases-2.6.1/serve/deploy-many-models/multi-app.html
@sihanwang41 what is the status of Multi Apps in Ray Serve? The documentation still shows only the serve.run
method with centralised YAML file which I am not sure whether it is as flexible as the old .deploy()
capability when you have multiple requests coming in parallel.
For example: every time a user presses a button on a UI, I want to deploy a model on a shared Ray Cluster. Using the old API, I could just call deploy()
. Now I need to retrieve the yaml file, update it, and re-run serve.run
using that YAML file.
Now, what happens if two requests come at the same time, and both retrieve the original YAML file? Will they over-write and kill each other as they are unaware of each other at deployment time?
@sihanwang41 what is the status of Multi Apps in Ray Serve? The documentation still shows only the
serve.run
method with centralised YAML file which I am not sure whether it is as flexible as the old.deploy()
capability when you have multiple requests coming in parallel.For example: every time a user presses a button on a UI, I want to deploy a model on a shared Ray Cluster. Using the old API, I could just call
deploy()
. Now I need to retrieve the yaml file, update it, and re-runserve.run
using that YAML file.Now, what happens if two requests come at the same time, and both retrieve the original YAML file? Will they over-write and kill each other as they are unaware of each other at deployment time?
Hi @andreapiso , it is fully supported now!
you can use serve.run(xxx, name="app1")
and serve.run(xxx, name="app2")
to deploy them separately. For YAML file, you need to include all the applications into the YAML. If apps are not in the YAML file, the app will be removed. For more information, please check: https://docs.ray.io/en/latest/serve/multi-app.html