ray-llm icon indicating copy to clipboard operation
ray-llm copied to clipboard

Running ray-llm 0.5.0 on g4dn.12xlarge instance

Open golemsentience opened this issue 1 year ago • 2 comments

Has anyone had any success serving llms through the 0.5.0 docker image?

I have followed the following steps:

cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}

docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=/tmp/data -v $cache_dir:/home/user/data anyscale/ray-llm:0.5.0 bash

I have reconfigured the .yaml with

accelerator_type_T4

ray start --head --dashboard-host=0.0.0.0 --num-cpus 48 --num-gpus 4 --resources{"accelerator_type_T4": 4}'

serve run ~/serve_configs/amazon--LightGPT.yaml

It runs, but I get a

"Deployment 'VLLMDeployment: amazon--LightGPT' in application 'ray-llm' has 2 replicas that have taken more than 30s to initialize. This may be caused by a slow init or reconfigure method.

From here, nothing happens. I've let it run for up to a couple of hours, it just seems to hang up here.

Any success working around these issues?

golemsentience avatar Apr 26 '24 14:04 golemsentience

I'm using vllm as the serving vllm serving And run inference using Ray serve, here is a sample script: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py

And just make it as a Ray serve like:

@serve.deployment # (num_replicas=1 ,ray_actor_options={"num_gpus": 1})
@serve.ingress(app)
class VLLMPredictDeployment():
    def __init__(self, **kwargs):

nkwangleiGIT avatar Apr 28 '24 01:04 nkwangleiGIT

What does ray status say?

teocns avatar May 01 '24 08:05 teocns