modal-client [MOD-3841]: run web server in separate thread

Allows blocking calls in web_server decorated functions so you can do something like:

import modal

image = modal.Image.debian_slim().pip_install("lavague-core", "lavague-server").env({"LAVAGUE_TELEMETRY": "NONE"})
app = modal.App("lavague-server", image=image)


@app.function()
@modal.web_server(port=8000)
def websocket_app():
    from lavague.core import ActionEngine, WorldModel
    from lavague.core.agents import WebAgent
    from lavague.server import AgentSession
    from lavague.server.base import AgentServer
    from lavague.server.driver import DriverServer

    def create_agent(session: AgentSession):
        world_model = WorldModel()
        driver = DriverServer(session)
        action_engine = ActionEngine(driver)
        return WebAgent(world_model, action_engine)

    server = AgentServer(create_agent, port=8000)
    server.serve()

instead of having to run it in a thread, since it can be confusing that the web_server decorator currently expects users to run everything in a background thread

Sep 20 '24 14:09 kramstrom

If you wanted to use uvicorn.run(), couldn't you just use asgi_app in that case?

Right now web_server is designed to wait until the function finishes executing before it starts polling for the server to start up, this is more flexible than always running the code in a thread and also will propagate initialization errors sooner.

This is also a breaking change since the startup_timeout would start at function begin, not at function end.

Sep 20 '24 14:09 ekzhang

If you wanted to use uvicorn.run(), couldn't you just use asgi_app in that case?

Right now web_server is designed to wait until the function finishes executing before it starts polling for the server to start up, this is more flexible than always running the code in a thread and also will propagate initialization errors sooner.

This is also a breaking change since the startup_timeout would start at function begin, not at function end.

Yeah the uvicorn was just an example of something that will block the thread. I think the reasoning for running the function in a thread is that it can be a footgun for users that their web_server decorated function can be blocking and never therefore never return without indicating clearly why

Sep 20 '24 14:09 kramstrom

Yeah the uvicorn was just an example of something that will block the thread.

Right, I'm just saying that if the concern is ergonomics in eliminating one line of code (Thread().start()), we should design based on examples that exist rather than theoretical examples.

Sep 20 '24 15:09 ekzhang

Yeah the uvicorn was just an example of something that will block the thread.

Right, I'm just saying that if the concern is ergonomics in eliminating one line of code (Thread().start()), we should design based on examples that exist rather than theoretical examples.

Yup, that's totally fair, updated the example now to a real user issue that prompted this

Sep 23 '24 09:09 kramstrom

Do we want to merge this? The issue isn't eliminating one line of code for the user, it's that users don't realize they are supposed to launch something in the background. It's a bit of a footgun.

Jan 07 '25 20:01 aksh-at

Can we add a small test?

I agree this is valuable and prevents a common footgun. Since this is a slightly breaking change (startup_timeout, potential threading issues) we should definitely add a changelog entry about it. Maybe add a minor version bump as well?

Jan 08 '25 08:01 freider

Should the changelog call out the new timeout semantics?

Jan 08 '25 16:01 mwaskom

Do we want to merge this? The issue isn't eliminating one line of code for the user, it's that users don't realize they are supposed to launch something in the background. It's a bit of a footgun.

Yeah I understand that, but it's pretty quick to notice in development the issue and how to support it. I'd argue that the alternative is less flexible and could lead to subtle bugs in production.

Developers already seem to set quite high startup timeouts, several minutes or more. With this change it would:

Cause exceptions at process startup to not be caught until after the startup timeout, making the container idle for that time
the startup timeout would also need to be extended further to include any data loading step in the function

Would also like to implement a healthcheck / liveness loop and Elias said, which could be returned from this function. The current API is built to be forward compatible with this in mind.

Jan 08 '25 17:01 ekzhang

Also just wanted to mention this change is much harder to reverse than to apply, which makes it worth thinking about forward compatibility with a better API to support production workloads imo

Jan 08 '25 17:01 ekzhang

Closing in favor of coming up with a new solution after internal discussion!

Jan 15 '25 11:01 kramstrom