Support vLLM and deepspeed-fastgen for LLM inference
/kind feature
Describe the solution you'd like Suggest supportting vLLM or deepspeed-fastgen for LLM inference, that's hot for inference projects now.
Anything else you would like to add: fastest LLM ineference support on kserver
Links to the design documents: [Optional, start with the short-form RFC template to outline your ideas and get early feedback.] [Required, use the longer-form design doc template to specify and discuss your design in more detail]
Check out https://github.com/kserve/kserve/pull/3334, which provides a high level abstraction based on HF. vLLM backend will also be implemented.
@jinchihe vllm support being added through this change - #3415