akshay-anyscale

Results 17 comments of akshay-anyscale

Ready to merge. pending @aslonnie 's approval

hi @lamhoangtung can you try using the serve run command instead. You can refer to the readme here for example usage - https://github.com/ray-project/ray-llm

Can you share the model yamls that you are using? You'll need to set num_gpus_per_worker to 0.5 for both

Are you looking for fine-tuning LLMs? RayLLM currently is only meant for inference but we do have examples for how to do fine-tuning using Ray - https://docs.ray.io/en/latest/train/examples/lightning/vicuna_13b_lightning_deepspeed_finetune.html#fine-tune-vicuna-13b-with-lightning-and-deepspeed

Can you provide the code you are using for querying?

yes you should be able to setup observability using the ray serve general guides. For the custom metrics, you can use "ray_aviary" to search the metrics (eg. if you're using...

Hi @roelschr I believe this is because we do some batching(upto 100ms) to make the streaming more efficient. If you make the denominator "ray_aviary_tokens_generated" instead, this should be closer to...

try using serve run serve_configs/meta-llama--Llama-2-7b-chat-hf.yaml . I'll fix the docs to reflect that

Docs fixed here https://github.com/ray-project/ray-llm/pull/85