skypilot Add example for inference server

This is a first pass at how a server executing computations on Sky might be implemented. The use case I have in mind is when a user wants to execute certain computations on Sky triggered by HTTP requests, and based on the request parameters as input.

To test it out:

Update the LOCAL_UPLOAD_FOLDER variable and data init file mount path.
FLASK_APP=examples/inference_server/server.py flask run then access http://127.0.0.1:5000 and upload an image. The server spins up a Sky cluster, initializes the cluster with some model weights, runs some computation on the uploaded image, and returns the result as the HTTP response.

Some thoughts on the implementation:

Currently, there does not seem to be a good way of retrieving the return value of a function. As such, I had to use regex and logs to extract the result of the function. This is not ideal, but I believe it is being addressed by #978
Because the handler launches a new Sky cluster, the request is very long running, taking > 10 min currently. I have 2 ideas for how to improve this:
- Use a task queue like Celery to execute the computations in the background when the handler is called, and return the job ID instead. User can use the job ID to poll the status of the job.
- It looks like sky.launch takes the bulk of the time. If all the computations can be executed on the same cluster, we can have a setup function run before the first request is made that spins up the cluster, and subsequent computations will all be run on this cluster using sky.exec.

Let me know what you think!

Jul 19 '22 06:07 iojw

Hey @iojw, thanks for submitting this PR!

Just curious: what's the expected usage? Is it for real-time inference or offline inference? Is the inference request repeated over time? And why do we need Sky Python API for this? I think we need to make a concrete story here.

Jul 19 '22 20:07 WoosukKwon

@WoosukKwon I believe the expected usage is for real-time inference, with the inference request changing based on the input image being passed in the HTTP request. I think the case for using the Python API is that it is much more natural over using subprocess since the server is built in Python as well. cc @concretevitamin

Jul 20 '22 04:07 iojw