[MOD-3841]: run web server in separate thread
Allows blocking calls in web_server decorated functions so you can do something like:
import modal
image = modal.Image.debian_slim().pip_install("lavague-core", "lavague-server").env({"LAVAGUE_TELEMETRY": "NONE"})
app = modal.App("lavague-server", image=image)
@app.function()
@modal.web_server(port=8000)
def websocket_app():
from lavague.core import ActionEngine, WorldModel
from lavague.core.agents import WebAgent
from lavague.server import AgentSession
from lavague.server.base import AgentServer
from lavague.server.driver import DriverServer
def create_agent(session: AgentSession):
world_model = WorldModel()
driver = DriverServer(session)
action_engine = ActionEngine(driver)
return WebAgent(world_model, action_engine)
server = AgentServer(create_agent, port=8000)
server.serve()
instead of having to run it in a thread, since it can be confusing that the web_server decorator currently expects users to run everything in a background thread
If you wanted to use uvicorn.run(), couldn't you just use asgi_app in that case?
Right now web_server is designed to wait until the function finishes executing before it starts polling for the server to start up, this is more flexible than always running the code in a thread and also will propagate initialization errors sooner.
This is also a breaking change since the startup_timeout would start at function begin, not at function end.
If you wanted to use uvicorn.run(), couldn't you just use
asgi_appin that case?Right now
web_serveris designed to wait until the function finishes executing before it starts polling for the server to start up, this is more flexible than always running the code in a thread and also will propagate initialization errors sooner.This is also a breaking change since the
startup_timeoutwould start at function begin, not at function end.
Yeah the uvicorn was just an example of something that will block the thread. I think the reasoning for running the function in a thread is that it can be a footgun for users that their web_server decorated function can be blocking and never therefore never return without indicating clearly why
Yeah the uvicorn was just an example of something that will block the thread.
Right, I'm just saying that if the concern is ergonomics in eliminating one line of code (Thread().start()), we should design based on examples that exist rather than theoretical examples.
Yeah the uvicorn was just an example of something that will block the thread.
Right, I'm just saying that if the concern is ergonomics in eliminating one line of code (
Thread().start()), we should design based on examples that exist rather than theoretical examples.
Yup, that's totally fair, updated the example now to a real user issue that prompted this
Do we want to merge this? The issue isn't eliminating one line of code for the user, it's that users don't realize they are supposed to launch something in the background. It's a bit of a footgun.
Can we add a small test?
I agree this is valuable and prevents a common footgun. Since this is a slightly breaking change (startup_timeout, potential threading issues) we should definitely add a changelog entry about it. Maybe add a minor version bump as well?
Should the changelog call out the new timeout semantics?
Do we want to merge this? The issue isn't eliminating one line of code for the user, it's that users don't realize they are supposed to launch something in the background. It's a bit of a footgun.
Yeah I understand that, but it's pretty quick to notice in development the issue and how to support it. I'd argue that the alternative is less flexible and could lead to subtle bugs in production.
Developers already seem to set quite high startup timeouts, several minutes or more. With this change it would:
- Cause exceptions at process startup to not be caught until after the startup timeout, making the container idle for that time
- the startup timeout would also need to be extended further to include any data loading step in the function
Would also like to implement a healthcheck / liveness loop and Elias said, which could be returned from this function. The current API is built to be forward compatible with this in mind.
Also just wanted to mention this change is much harder to reverse than to apply, which makes it worth thinking about forward compatibility with a better API to support production workloads imo
Closing in favor of coming up with a new solution after internal discussion!