akshay-anyscale comments

Results 17 comments of


                                            akshay-anyscale

Add example of serving LLM with Ray Serve and vLLM

Ready to merge. pending @aslonnie 's approval

Deploying RayLLM locally failed with exit code 0 even if deployment is ready

hi @lamhoangtung can you try using the serve run command instead. You can refer to the readme here for example usage - https://github.com/ray-project/ray-llm

Multiple models second models always request GPU: 1

Can you share the model yamls that you are using? You'll need to set num_gpus_per_worker to 0.5 for both

How to submit a LLM training job?

Are you looking for fine-tuning LLMs? RayLLM currently is only meant for inference but we do have examples for how to do fine-tuning using Ray - https://docs.ray.io/en/latest/train/examples/lightning/vicuna_13b_lightning_deepspeed_finetune.html#fine-tune-vicuna-13b-with-lightning-and-deepspeed

Remote address refuse queries

Can you provide the code you are using for querying?

Is there a way to increase the scaling up speed?

What models are you using?

LLM Deployment Observability

yes you should be able to setup observability using the ray serve general guides. For the custom metrics, you can use "ray_aviary" to search the metrics (eg. if you're using...

LLM Deployment Observability

Hi @roelschr I believe this is because we do some batching(upto 100ms) to make the streaming more efficient. If you make the denominator "ray_aviary_tokens_generated" instead, this should be closer to...

[doc] Cannot deploy an LLM model on EKS with KubeRay

try using serve run serve_configs/meta-llama--Llama-2-7b-chat-hf.yaml . I'll fix the docs to reflect that

[doc] Cannot deploy an LLM model on EKS with KubeRay

Docs fixed here https://github.com/ray-project/ray-llm/pull/85