Add example for inference server
This is a first pass at how a server executing computations on Sky might be implemented. The use case I have in mind is when a user wants to execute certain computations on Sky triggered by HTTP requests, and based on the request parameters as input.
To test it out:
- Update the
LOCAL_UPLOAD_FOLDERvariable and data init file mount path. -
FLASK_APP=examples/inference_server/server.py flask runthen accesshttp://127.0.0.1:5000and upload an image. The server spins up a Sky cluster, initializes the cluster with some model weights, runs some computation on the uploaded image, and returns the result as the HTTP response.
Some thoughts on the implementation:
- Currently, there does not seem to be a good way of retrieving the return value of a function. As such, I had to use regex and logs to extract the result of the function. This is not ideal, but I believe it is being addressed by #978
- Because the handler launches a new Sky cluster, the request is very long running, taking > 10 min currently. I have 2 ideas for how to improve this:
- Use a task queue like Celery to execute the computations in the background when the handler is called, and return the job ID instead. User can use the job ID to poll the status of the job.
- It looks like
sky.launchtakes the bulk of the time. If all the computations can be executed on the same cluster, we can have a setup function run before the first request is made that spins up the cluster, and subsequent computations will all be run on this cluster usingsky.exec.
Let me know what you think!
Hey @iojw, thanks for submitting this PR!
Just curious: what's the expected usage? Is it for real-time inference or offline inference? Is the inference request repeated over time? And why do we need Sky Python API for this? I think we need to make a concrete story here.
@WoosukKwon I believe the expected usage is for real-time inference, with the inference request changing based on the input image being passed in the HTTP request. I think the case for using the Python API is that it is much more natural over using subprocess since the server is built in Python as well. cc @concretevitamin