ray_vllm_inference issues

Results 4 ray_vllm_inference issues

Sort by recently updated

How can we do the same, using KubeRay, RayCluster and RayServe?

So i've got a kubernetes cluster, installed with KubeRay Operator and created a RayCluster, but how do i then create a manifest for Rayserve to serve, say a Llama2 or...

WinsonSou

how to use when offline LLM?

Your tutorial uses an online model download, but some environments are offline. Is there a deployment guide for offline LLM?

ipv6ok

ray serve get stuck when loading two or more applications

This is my .yaml configuration file: ```yaml # Serve config file # # For documentation see: # https://docs.ray.io/en/latest/serve/production-guide/config.html host: 0.0.0.0 port: 8000 applications: - name: demo_app route_prefix: /a import_path: ray_vllm_inference.vllm_serve:deployment...

Dolfik1

Can I run a model service on multiple GPUs？

I want to run a model service on multiple GPUs without TP。

zrl4836

ray_vllm_inference
ray_vllm_inference copied to clipboard

Metadata

How can we do the same, using KubeRay, RayCluster and RayServe?

how to use when offline LLM?

ray serve get stuck when loading two or more applications

Can I run a model service on multiple GPUs？

← Metadata

Owner

Metadata

ray_vllm_inference ray_vllm_inference copied to clipboard

Metadata

How can we do the same, using KubeRay, RayCluster and RayServe?

how to use when offline LLM?

ray serve get stuck when loading two or more applications

Can I run a model service on multiple GPUs？

← Metadata

Owner

Metadata

ray_vllm_inference
ray_vllm_inference copied to clipboard