ray-llm Is there a way to increase the scaling up speed?

Is there a way to increase the scaling up speed?

Open rifkybujana opened this issue 1 year ago • 2 comments

As the title suggest, and from what i've experienced, vllm is slower than TGI in term of loading model, is there a way to optimize it? As of right now, it takes 1-2 minutes to scale-up instances using AWS G5 instance.

Nov 30 '23 09:11 rifkybujana

What models are you using?

Dec 14 '23 19:12 akshay-anyscale

Llama 7b quantized with awq

Jan 01 '24 06:01 rifkybujana

ray-llm ray-llm copied to clipboard

Is there a way to increase the scaling up speed?

ray-llm
ray-llm copied to clipboard