data-on-eks icon indicating copy to clipboard operation
data-on-eks copied to clipboard

Inferentia: Test horizontal scaling of Ray Worker Pods

Open ratnopam opened this issue 1 year ago • 1 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Scale test to verify that Ray Serve replicas and Ray worker Pods can horizontally scale.

Describe the solution you would like

Develop a test script that will send concurrent request to the RayServe Inference endpoint for the StableDiffusion model. Verify that new nodes and pods are created once the HTTP request limit specified in the RayServe configuration is reached.

Describe alternatives you have considered

Additional context

This exercise is to ensure Ray Serve replicas and Pods can scale based on number of concurrent HTTP requests.

ratnopam avatar Feb 24 '24 00:02 ratnopam

https://github.com/ray-project/ray/issues/44361

ratnopam avatar Mar 29 '24 20:03 ratnopam