ray_vllm_inference icon indicating copy to clipboard operation
ray_vllm_inference copied to clipboard

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

Results 4 ray_vllm_inference issues
Sort by recently updated
recently updated
newest added

So i've got a kubernetes cluster, installed with KubeRay Operator and created a RayCluster, but how do i then create a manifest for Rayserve to serve, say a Llama2 or...

Your tutorial uses an online model download, but some environments are offline. Is there a deployment guide for offline LLM?

This is my .yaml configuration file: ```yaml # Serve config file # # For documentation see: # https://docs.ray.io/en/latest/serve/production-guide/config.html host: 0.0.0.0 port: 8000 applications: - name: demo_app route_prefix: /a import_path: ray_vllm_inference.vllm_serve:deployment...

I want to run a model service on multiple GPUs without TP。