ray [RFC][Serve] Multi Applications in serve 2.x API

Description

Since Ray 2.0, Ray Serve has recommended using the serve.run API over the previous .deploy() API. However, this introduces a limitation that the user can only run a single application per cluster using this API. We have heard from users who need to manage multiple applications on a single cluster and want to extend the functionality to support it. This includes separate individual models, deployment graphs, and/or FastAPI apps.

Proposal

High level idea

Introduce the concept of an “application” which can represent one or more deployments (bound into a graph). The driver of each application can define a FastAPI app for HTTP handling.
Extend the existing Ray Serve API to deploy and delete multiple individual applications.

API changes

Add name and route_prefix arguments to the serve.run() API. name is a unique identifier of each application. By default it will be set to the class name of the top-level driver deployment. route_prefix is the HTTP route under which requests will be sent to the application. By default it will be set to the root (“/”).
Introduce serve.delete(name) -> None API.
Introduce serve.list_applications() -> List[str] API.

Example

# Single model 
@serve.deployment
class Model1:
    def __call__(self):
        return "hello from Model1"

# Model graph
@serve.deployment
class ModelGraph1:
    def __init(self, model2_handle):
        self.handle = model2_handle

    def __call__(self):
        return self.handle.remote()

@serve.deployment
class ModelGraph2:
    def __call__(self):
        return "hello from model"

# FastAPI application
app = FastAPI()

@serve.deployment
@serve.ingress(app)
class MyFastAPI:
    @app.get("/my_route")
    def hello(self):
        return "hello"

# Deploy first application
serve.run(Model1.bind(), name=”my_app1”, route_prefix="app1")
requests.get("http://localhost:8000/app1")

# Deploy second application
serve.run(ModelGraph1.bind(ModelGraph2.bind), name=”my_app2”, route_prefix="app2")
requests.get("http://localhost:8000/app2")

# Deploy Fast Application
serve.run(MyFastAPI.bind(), name=”my_app3”, route_prefix="app3")
requests.get("http://localhost:8000/app3/my_route")

# Delete First Application
serve.delete("my_app1")

#List all current Applications
serve.list_applications()  # Returns information about app2 & app3.

Dec 13 '22 18:12 sihanwang41

This change would be great as it kind of "merges" the API V1 and V2 providing great flexibility.

Can these classes be defined programmatically like in the API V1? In V1 i would write

Model1.options(param_1).deploy(route1)
Model1.options(param_2).deploy(route2)

And I would have 2 variants of the same model deployed on two routes, which is great if for example your ray serve is downstream to other processes that train models and save those binaries on S3, then you can programmatically deploy these models giving the S3 url as an option.

Using this new API, I am imagining something along the lines of:

serve.run(Model1.options(url1).bind(), name=”my_app1”, route_prefix="app1")
serve.run(Model1.options(url2).bind(), name=”my_app1”, route_prefix="app1")

Or if this is not possible, maybe creating a factory that creates the serve deployment based on a parameter?

def ModelFactory(s3_url):
    @serve.deployment
    class Model:
        def __call__(self):
            return f"Hello from model pointing to {s3_url}"

model1 = ModelFactory(url1)
model2 = ModelFactory(url2)

serve.run(model1.bind(), name=”my_app1”, route_prefix="app1")
serve.run(model2.bind(), name=”my_app1”, route_prefix="app1")

Having something like that would allow for incremental/iterative programmatic deployments like in the API v1 which would be awesome.

Dec 15 '22 06:12 andreapiso

Hi @andreapiso , for this use case, how would you imagine sending requests to these two models? If two models are trained separately, you can directly use different route_prefix (e.g. app1, and app2) to differentiate them instead of using s3 URL. In this case, you can set route_prefix in the HTTP request to differentiate them. I am thinking to have

@serve.deployment()
class MyModel:
    def __init__(self, s3_url):
        self.model = load_model(s3_url)
    
    def __call__(self, input):
        return self.model(input)

serve.run(MyModel.bind(url1), app_name = "my_app1", route_prefix = "app1")
serve.run(MyModel.bind(url2), app_name = "my_app2", route_prefix = "app2")

requests.get("http://localhost:8000/app1")
requests.get("http://localhost:8000/app2")

This change would be great as it kind of "merges" the API V1 and V2 providing great flexibility.

Can these classes be defined programmatically like in the API V1? In V1 i would write
Model1.options(param_1).deploy(route1)
Model1.options(param_2).deploy(route2)
And I would have 2 variants of the same model deployed on two routes, which is great if for example your ray serve is downstream to other processes that train models and save those binaries on S3, then you can programmatically deploy these models giving the S3 url as an option.

Using this new API, I am imagining something along the lines of:
serve.run(Model1.options(url1).bind(), name=”my_app1”, route_prefix="app1")
serve.run(Model1.options(url2).bind(), name=”my_app1”, route_prefix="app1")
Or if this is not possible, maybe creating a factory that creates the serve deployment based on a parameter?
def ModelFactory(s3_url):
    @serve.deployment
    class Model:
        def __call__(self):
            return f"Hello from model pointing to {s3_url}"

model1 = ModelFactory(url1)
model2 = ModelFactory(url2)

serve.run(model1.bind(), name=”my_app1”, route_prefix="app1")
serve.run(model2.bind(), name=”my_app1”, route_prefix="app1")
 
Having something like that would allow for incremental/iterative programmatic deployments like in the API v1 which would be awesome.

Dec 18 '22 05:12 sihanwang41

Hi @andreapiso , for this use case, how would you imagine sending requests to these two models? If two models are trained separately, you can directly use different route_prefix (e.g. app1, and app2) to differentiate them instead of using s3 URL. In this case, you can set route_prefix in the HTTP request to differentiate them. I am thinking to have
@serve.deployment()
class MyModel:
    def __init__(self, s3_url):
        self.model = load_model(s3_url)
    
    def __call__(self, input):
        return self.model(input)

serve.run(MyModel.bind(url1), app_name = "my_app1", route_prefix = "app1")
serve.run(MyModel.bind(url2), app_name = "my_app2", route_prefix = "app2")

requests.get("http://localhost:8000/app1")
requests.get("http://localhost:8000/app2")
This change would be great as it kind of "merges" the API V1 and V2 providing great flexibility. Can these classes be defined programmatically like in the API V1? In V1 i would write
Model1.options(param_1).deploy(route1)
Model1.options(param_2).deploy(route2)
And I would have 2 variants of the same model deployed on two routes, which is great if for example your ray serve is downstream to other processes that train models and save those binaries on S3, then you can programmatically deploy these models giving the S3 url as an option. Using this new API, I am imagining something along the lines of:
serve.run(Model1.options(url1).bind(), name=”my_app1”, route_prefix="app1")
serve.run(Model1.options(url2).bind(), name=”my_app1”, route_prefix="app1")
Or if this is not possible, maybe creating a factory that creates the serve deployment based on a parameter?
def ModelFactory(s3_url):
    @serve.deployment
    class Model:
        def __call__(self):
            return f"Hello from model pointing to {s3_url}"

model1 = ModelFactory(url1)
model2 = ModelFactory(url2)

serve.run(model1.bind(), name=”my_app1”, route_prefix="app1")
serve.run(model2.bind(), name=”my_app1”, route_prefix="app1")
 
Having something like that would allow for incremental/iterative programmatic deployments like in the API v1 which would be awesome.

Hi @sihanwang41, yes, inference would happen directly. What the S3 link is useful for is for programmatic deployment and exposing these models in the first place. Imagine i have a process upstream that trains a model, saves the binary in S3 and calls a Ray Serve "standard serving class" asking "please create a new endpoint to deploy this model i just trained, this is the S3 link"

Dec 18 '22 06:12 andreapiso

This is a great addition.

I would also add that this type of "get/update" API would be really helpful for my use case, which is trying to run a deployment and send requests via the python API programmatically.

Another thing that is useful in development is the ability to "delete" a deployment (i.e., change the port that it is bound on).

Dec 23 '22 05:12 richardliaw

I think it is important to note, that this change (lack of support for running multiple applications in the v2.x API) also permanently and irreparably breaks integrations like the Haystack-Ray integration and if this functionality is not restored in v2.x then likely will cause the loss of Haystack users who are using or considering using Ray Serve as a backend.

In Haystack the NLP pipeline runs locally and deploys its components individually (as individual deployments) onto Ray Serve and then keeps the ServeHandle for each. At runtime it calls each of those ServeHandles as the data progresses through the Haystack NLP pipeline. This happens this way, so we can have Haystack pipeline components which are NOT deployed to Ray Serve (for various reasons) while other components of the same pipeline are deployed to Ray Serve. This mix-and-match approach allows great flexibility.

Also this same approach allows for the reuse of the same component at runtime between multiple separate Haystack apps on the component level. So if there are two Haystack apps with two different pipelines but those pipelines have overlapping components, then those components won't need to be deployed onto Ray Serve twice and wasting expensive resources (like GPUs), but can be deployed only once and reused between the two Haystack apps. For example Haystack Pipeline A deployed a "Reader" component as part of its NLP pipeline to Ray Serve, then another Haystack Pipeline B can use the same "Reader" component from Ray Serve, without needed to deploy its own copy. This is achieved by the ServeHandle returning the same handle as long as the deployment name and version is the same. This is important for resource re-use. This is why we also mourn the loss of the version parameter in v2.x, but we can potentially work around it with the name, just gets less clear that way.

@shrekris-anyscale has told me that this should be possible with the new design, by using graph with individual deployments - as long as the graph abstraction is lightweight enough. So Haystack would deploy each component of its Haystack NLP Pipeline as a separate graph onto Ray Serve and keep the handle for each of those deployed graphs, something like:

# Haystack NLP Pipeline - component #1:
@serve.deployment
def f1():
    ...

graph1 = f1.bind()
handle1 = serve.run(graph1)
haystack.store(handle1)

# Haystack NLP Pipeline - component #2:
@serve.deployment
def f2():
    ...

graph2 = f2.bind()
handle2 = serve.run(graph2)
haystack.store(handle2)
...

Which is the same pattern Haystack follows today with Ray v1.x API, except instead of bind() and serve.run() it uses the deploy() and get_handle() and then at runtime ray.get() with remote(). (see here and then here)

Dec 30 '22 00:12 zoltan-fedor

This is really great!

I am currently using multiple applications in a single cluster with Ray 1.x. These proposed changes make possible to have this on Ray 2.x, probably with minor changes on the application side. Also, I could use single models or model graphs.

Jan 01 '23 22:01 lasp73

Hi, is there any update on this? Also, how can I currently support multiple apps? I have 1 cluster with multiple apps using .deploy(). I am trying to use .bind() but it's not possible due to the limitation. What can I do currently to change my code to run .bind()?

Feb 06 '23 05:02 TUB-hasib

If you want to support multiple apps, you will need to use deploy until the changes discussed in the RFC make it in.

Feb 06 '23 05:02 andreapiso

@andreapiso Is there any way around it? for example, creating a separate cluster for each app. or putting models of all the apps in a single model. Based on the route decisions, it will decide which model part to invoke.

Feb 06 '23 10:02 TUB-hasib

Here's an approach for deploying multiple deployments that are defined with FastAPI semantics with a single head/ingress deployment that routes to the child deployments. It's probably not the solution to everyone's use case that is following this issue, but it was the solution to mine.

This was adapted from working code; there are lots of missing import statements and such. Hopefully you can get the gist of how this works.

# test_resource.py

app_a = FastAPI()  # each deployment must have its own FastAPI app

@serve.deployment(ray_actor_options={'num_cpus': 1})
@serve.ingress(app_a)
class TestResourceUpdateDeployment:
    @app_a.patch('/test_resource/{test_resource_id}', response_model=TestResourceResponse)
    def update_test_resource(
        self,
        test_resource_data: TestResourceData,
        test_resource_id: str,
    ) -> TestResource:
        test_resource = TestResource.find(test_resource_id)
        test_resource.update(test_resource_data).save()
        return test_resource


test_resource_update_deployment = TestResourceUpdateDeployment.bind()  # type: ignore

# ingress.py

from fastapi import FastAPI
from fastapi.routing import APIRoute
from mantium.logging import get_logger
from ray import serve
from ray.dag.class_node import ClassNode
from starlette.requests import Request
from starlette.responses import JSONResponse
from starlette.routing import Match

from test_resource import app_a, test_resource_update_deployment

@serve.deployment
class IngressDeployment:
    """
    An ingress deployment that routes requests to sub deployments using FastAPI instances.

    Each sub deployment must be passed to the initializer in a tuple with the FastAPI instance
    used to define its routes.
    """

    def __init__(self, *deployment_tuples: tuple[FastAPI, ClassNode]):
        self.routers_handles = []
        for app, handle in deployment_tuples:
            for route in app.routes:
                if isinstance(route, APIRoute):  # user defined routes are APIRoutes
                    self.routers_handles.append((route, handle))

    async def __call__(self, request: Request) -> Any:
        """Find an APIRoute that routes the request. Send the request the accompanying handler."""
        for api_route, handle in self.routers_handles:
            (match, _,) = api_route.matches(request.scope)
            # Example of route matching:
            # https://github.com/tiangolo/fastapi/blob/0.85.2/fastapi/routing.py#L309-L313
            if match == Match.FULL:
                ref = await handle.remote(request)
                return await ref

        return JSONResponse(content={'detail': 'No matching Ray Serve deployment'}, status_code=404)


ingress_deployment = IngressDeployment.bind(  # type: ignore
    (app_a, test_resource_deployment),
)

You can run serve build ingress:ingress_deployment to get a yaml config file. The yaml will have two independent deployments - ingress and test_resource_deployment. Resources for each can be configured independently. You can use the yaml file for a serve deploy. Or, you can do serve run ingress:ingress_deployment.

Feb 07 '23 21:02 bengladwell

Hi serve.run for multi-application will be available on 2.3, it is available on master already! We are working on the CLI & REST API to support further advanced deployment for multi applications in 2.4 release. I will keep the channel updated when the 2.3 is available. (tentative: mid of Feb)

Thank you! Sihan

Feb 09 '23 00:02 sihanwang41

@sihanwang41 , Is the 2.3 version going to work with multiple fast-API apps?

Feb 09 '23 05:02 TUB-hasib

@sihanwang41 will this come to gRPC also?

Feb 09 '23 10:02 roelschr

Thanks for the 2.3 release with Python API. When can we expect the 2.4 release with CLI and REST API for multi applications? Also in the docs, in the production guide, it is mentioned to use serve.deploy rather than serve.run. Which should be preferred?

Feb 26 '23 20:02 AravindhKuppusamy

Thanks for the 2.3 release with Python API. When can we expect the 2.4 release with CLI and REST API for multi applications? Also in the docs, in the production guide, it is mentioned to use serve.deploy rather than serve.run. Which should be preferred?

Hi, yes, CLI and REST API should be supported in 2.4. And you can use serve deploy for production. serve deploy is calling REST API to deploy your application.

Feb 26 '23 20:02 sihanwang41

Thanks. As of the 2.3 release, serve.deploy using config files requires an import path which is the path to top-level serve deployment meaning only one application is supported. Just wanted to confirm, with the updates to CLI and REST API in 2.4, would the config files have the option to support multi applications as well?

Feb 26 '23 20:02 AravindhKuppusamy

Yes, with 2.4 you should be able to deploy multi applications by using CLI and REST API.

Feb 26 '23 21:02 sihanwang41

@sihanwang41 Still seeing this behavior in Ray 2.3.0 when using serve.run for multiple deployments. Can you confirm you this functionality working?

Apr 25 '23 17:04 peterghaddad

@sihanwang41 Still seeing this behavior in Ray 2.3.0 when using serve.run for multiple deployments. Can you confirm you this functionality working?

Hi @peterghaddad, can you share your script?

Apr 25 '23 18:04 sihanwang41

@sihanwang41 I am using the example on the homepage of the Ray serve docs. Tasking two deployments and running serve.run twice

Also in these docs: https://docs.ray.io/en/releases-2.4.0/serve/model_composition.html#serve-model-composition-deployment-graph

serve.run(node): This Python call can be added to your Python script to run a particular node. This call starts a Ray cluster (if one isn’t already running), deploys the node to it, and then returns. You can call this function multiple times in the same script on different DeploymentNodes. Each time, it tears down any deployments it previously deployed and deploy the passed-in node’s deployment. After the script exits, the cluster and any nodes deployed by serve.run are torn down.

Is serving multiple models with serve.run possible?

import requests
from starlette.requests import Request
from typing import Dict

from ray import serve


# 1: Define a Ray Serve deployment.
@serve.deployment(route_prefix="/")
class MyModelDeployment:
    def __init__(self, msg: str):
        # Initialize model state: could be very large neural net weights.
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}

@serve.deployment(route_prefix="/")
class Test:
    def __init__(self, msg: str):
        # Initialize model state: could be very large neural net weights.
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}



# 2: Deploy the model.
serve.run(MyModelDeployment.bind(msg="Hello world!"))
serve.run(Test.bind(msg="Hello world!"))

Apr 25 '23 22:04 peterghaddad

It seems like the above will randomly choose to delete one, and redeploy the other.

Using the following works.

serve.run(Test.options().bind(msg="test"), name="my_app1", route_prefix="/app1")
serve.run(MyModelDeployment.options().bind(msg="test"), name="my_app2", route_prefix="/app2")

Is anyone else able to deploy multiple independent models using serve.run? Thanks!

Apr 26 '23 12:04 peterghaddad

Hi @peterghaddad , yeah, if you don't want your old app to be killed, you need to set "name" attribute.

Apr 26 '23 16:04 sihanwang41

For someone who comes here like me, this doc may help: https://docs.ray.io/en/releases-2.6.1/serve/deploy-many-models/multi-app.html

Aug 09 '23 04:08 JalinWang

@sihanwang41 what is the status of Multi Apps in Ray Serve? The documentation still shows only the serve.run method with centralised YAML file which I am not sure whether it is as flexible as the old .deploy() capability when you have multiple requests coming in parallel.

For example: every time a user presses a button on a UI, I want to deploy a model on a shared Ray Cluster. Using the old API, I could just call deploy(). Now I need to retrieve the yaml file, update it, and re-run serve.run using that YAML file.

Now, what happens if two requests come at the same time, and both retrieve the original YAML file? Will they over-write and kill each other as they are unaware of each other at deployment time?

Oct 02 '23 09:10 andreapiso

@sihanwang41 what is the status of Multi Apps in Ray Serve? The documentation still shows only the serve.run method with centralised YAML file which I am not sure whether it is as flexible as the old .deploy() capability when you have multiple requests coming in parallel.

For example: every time a user presses a button on a UI, I want to deploy a model on a shared Ray Cluster. Using the old API, I could just call deploy(). Now I need to retrieve the yaml file, update it, and re-run serve.run using that YAML file.

Now, what happens if two requests come at the same time, and both retrieve the original YAML file? Will they over-write and kill each other as they are unaware of each other at deployment time?

Hi @andreapiso , it is fully supported now! you can use serve.run(xxx, name="app1") and serve.run(xxx, name="app2") to deploy them separately. For YAML file, you need to include all the applications into the YAML. If apps are not in the YAML file, the app will be removed. For more information, please check: https://docs.ray.io/en/latest/serve/multi-app.html

Oct 02 '23 16:10 sihanwang41

ray ray copied to clipboard

[RFC][Serve] Multi Applications in serve 2.x API

Description

Proposal

High level idea

API changes

Example

ray
ray copied to clipboard