ray icon indicating copy to clipboard operation
ray copied to clipboard

Add inference serve example to run Stable Diffusion Inference using AWS Inferentia2

Open ratnopam opened this issue 1 year ago • 5 comments

Why are these changes needed?

This example showcases how to serve Stable Diffusion Inference using AWS Inferentia2

Related issue number

Closes #43018

Checks

  • [x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [x] I've run scripts/format.sh to lint the changes in this PR.
  • [x] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [x] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [x] Manual testing

Tested on a Inferentia(inf2.xl) instance (with 2 neuron_cores).

Serve deployment

2024-02-07 17:53:28,299	INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
(ProxyActor pid=25282) INFO 2024-02-07 17:53:31,751 proxy 172.31.10.188 proxy.py:1128 - Proxy actor fd464602af1e456162edf6f901000000 starting on node 5a8e0c24b22976f1f7672cc54f13ace25af3664a51429d8e332c0679.
(ProxyActor pid=25282) INFO 2024-02-07 17:53:31,755 proxy 172.31.10.188 proxy.py:1333 - Starting HTTP server on node: 5a8e0c24b22976f1f7672cc54f13ace25af3664a51429d8e332c0679 listening on port 8000
(ProxyActor pid=25282) INFO:     Started server process [25282]
(ServeController pid=25233) INFO 2024-02-07 17:53:31,921 controller 25233 deployment_state.py:1545 - Deploying new version of deployment StableDiffusionV2 in application 'default'. Setting initial target number of replicas to 1.
(ServeController pid=25233) INFO 2024-02-07 17:53:31,922 controller 25233 deployment_state.py:1545 - Deploying new version of deployment APIIngress in application 'default'. Setting initial target number of replicas to 1.
(ServeController pid=25233) INFO 2024-02-07 17:53:32,024 controller 25233 deployment_state.py:1829 - Adding 1 replica to deployment StableDiffusionV2 in application 'default'.
(ServeController pid=25233) INFO 2024-02-07 17:53:32,029 controller 25233 deployment_state.py:1829 - Adding 1 replica to deployment APIIngress in application 'default'.
Fetching 20 files: 100%|██████████| 20/20 [00:00<00:00, 195538.65it/s]
(ServeController pid=25233) WARNING 2024-02-07 17:54:02,114 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(ServeController pid=25233) WARNING 2024-02-07 17:54:32,170 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(ServeController pid=25233) WARNING 2024-02-07 17:55:02,344 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
(ServeController pid=25233) WARNING 2024-02-07 17:55:32,418 controller 25233 deployment_state.py:2171 - Deployment 'StableDiffusionV2' in application 'default' has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
2024-02-07 17:55:46,263	SUCC scripts.py:483 -- Deployed Serve app successfully.

Sample Test

import requests
prompt = "a zebra is dancing in the grass, river, sunlit"
input = "%20".join(prompt.split(" "))
resp = requests.get(f"http://127.0.0.1:8000/imagine?prompt={input}")

print("Write the response to `output.png`.")
with open("output.png", "wb") as f:
    f.write(resp.content)

ratnopam avatar Feb 07 '24 21:02 ratnopam

This PR references an image that's submitted in https://github.com/ray-project/images/pull/18

ratnopam avatar Feb 07 '24 21:02 ratnopam

Is anyone able to review this PR and merge if ok? Thanks.

ratnopam avatar Feb 13 '24 16:02 ratnopam

Hi all, to add this to the new example gallery that we are building we need the following information:

  • Skill level (beginner, intermediate, advanced)
  • Frameworks (pytorch, deepspeed, etc)
  • Use case (see the use cases section on the primary sidebar on the left of this page: https://docs.ray.io/en/latest/ray-overview/examples.html)

Thank you in advance!

peytondmurray avatar Feb 13 '24 21:02 peytondmurray

Generated doc can be viewed at https://anyscale-ray--43046.com.readthedocs.build/en/43046/serve/tutorials/index.html.

ratnopam avatar Feb 17 '24 05:02 ratnopam

@edoakes can you help to merge this

GeneDer avatar Feb 20 '24 17:02 GeneDer

There's a merge conflict here

edoakes avatar Feb 20 '24 20:02 edoakes

@ratnopamc can you address the merge conflicts as well? 🙏

GeneDer avatar Feb 20 '24 20:02 GeneDer

@edoakes this should be ready for merging now

GeneDer avatar Feb 21 '24 16:02 GeneDer