ray Add inference serve example to run Stable Diffusion Inference using AWS Inferentia2

Why are these changes needed?

This example showcases how to serve Stable Diffusion Inference using AWS Inferentia2

Related issue number

Closes #43018

Checks

[x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
[x] I've run scripts/format.sh to lint the changes in this PR.
[x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
[ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [x] Manual testing

Tested on a Inferentia(inf2.xl) instance (with 2 neuron_cores).

Serve deployment

2024-02-07 17:53:28,299	INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
(ProxyActor pid=25282) INFO 2024-02-07 17:53:31,751 proxy 172.31.10.188 proxy.py:1128 - Proxy actor fd464602af1e456162edf6f901000000 starting on node 5a8e0c24b22976f1f7672cc54f13ace25af3664a51429d8e332c0679.
(ProxyActor pid=25282) INFO 2024-02-07 17:53:31,755 proxy 172.31.10.188 proxy.py:1333 - Starting HTTP server on node: 5a8e0c24b22976f1f7672cc54f13ace25af3664a51429d8e332c0679 listening on port 8000
(ProxyActor pid=25282) INFO:     Started server process [25282]
(ServeController pid=25233) INFO 2024-02-07 17:53:31,921 controller 25233 deployment_state.py:1545 - Deploying new version of deployment StableDiffusionV2 in application 'default'. Setting initial target number of replicas to 1.
(ServeController pid=25233) INFO 2024-02-07 17:53:31,922 controller 25233 deployment_state.py:1545 - Deploying new version of deployment APIIngress in application 'default'. Setting initial target number of replicas to 1.
(ServeController pid=25233) INFO 2024-02-07 17:53:32,024 controller 25233 deployment_state.py:1829 - Adding 1 replica to deployment StableDiffusionV2 in application 'default'.
(ServeController pid=25233) INFO 2024-02-07 17:53:32,029 controller 25233 deployment_state.py:1829 - Adding 1 replica to deployment APIIngress in application 'default'.
Fetching 20 files: 100%|██████████| 20/20 [00:00<00:00, 195538.65it/s]
(ServeController pid=25233) WARNING 2024-02-07 17:54:02,114 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(ServeController pid=25233) WARNING 2024-02-07 17:54:32,170 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(ServeController pid=25233) WARNING 2024-02-07 17:55:02,344 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(ServeController pid=25233) WARNING 2024-02-07 17:55:32,418 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
2024-02-07 17:55:46,263	SUCC scripts.py:483 -- Deployed Serve app successfully.

Sample Test

import requests
prompt = "a zebra is dancing in the grass, river, sunlit"
input = "%20".join(prompt.split(" "))
resp = requests.get(f"http://127.0.0.1:8000/imagine?prompt={input}")

print("Write the response to `output.png`.")
with open("output.png", "wb") as f:
    f.write(resp.content)

Feb 07 '24 21:02 ratnopam

This PR references an image that's submitted in https://github.com/ray-project/images/pull/18

Feb 07 '24 21:02 ratnopam

Is anyone able to review this PR and merge if ok? Thanks.

Feb 13 '24 16:02 ratnopam

Hi all, to add this to the new example gallery that we are building we need the following information:

Skill level (beginner, intermediate, advanced)
Frameworks (pytorch, deepspeed, etc)
Use case (see the use cases section on the primary sidebar on the left of this page: https://docs.ray.io/en/latest/ray-overview/examples.html)

Thank you in advance!

Feb 13 '24 21:02 peytondmurray

Generated doc can be viewed at https://anyscale-ray--43046.com.readthedocs.build/en/43046/serve/tutorials/index.html.

Feb 17 '24 05:02 ratnopam

@edoakes can you help to merge this

Feb 20 '24 17:02 GeneDer

There's a merge conflict here

Feb 20 '24 20:02 edoakes

@ratnopamc can you address the merge conflicts as well? 🙏

Feb 20 '24 20:02 GeneDer

@edoakes this should be ready for merging now

Feb 21 '24 16:02 GeneDer