data-on-eks
data-on-eks copied to clipboard
feat: Inference using vLLM with RayServe on Inf2
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
What is the outcome that you are trying to reach?
- Create a new blueprint to showcase how to use vLLM with RayServe on AWS Inf2
- Deploy any LLM model as inference example
Describe the solution you would like
Create a pattern that enables serving inference using Ray with vLLM backend on Inf2. Create Website doc with step-by-step instructions for deployment and testing of the pattern.
Describe alternatives you have considered
Additional context
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html