data-on-eks icon indicating copy to clipboard operation
data-on-eks copied to clipboard

[Inference]: RayLLM pattern for LLMs

Open askulkarni2 opened this issue 2 years ago • 1 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

RayLLM is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on Ray Serve. It will allow us to provide an OOTB RESTful API for LLMs sourced from HuggingFace (including custom models).

Describe the solution you would like

Update JARK stack and other RayServe examples to use RayLLM.

askulkarni2 avatar Oct 05 '23 18:10 askulkarni2

Look into vLLM under the hood for autoscaling, continuous batching basically efficiently scaling LLM inference. Use https://github.com/ray-project/llmperf for benchmarking.

askulkarni2 avatar Feb 22 '24 19:02 askulkarni2