ray-llm icon indicating copy to clipboard operation
ray-llm copied to clipboard

Is there a way to increase the scaling up speed?

Open rifkybujana opened this issue 1 year ago • 2 comments

As the title suggest, and from what i've experienced, vllm is slower than TGI in term of loading model, is there a way to optimize it? As of right now, it takes 1-2 minutes to scale-up instances using AWS G5 instance.

rifkybujana avatar Nov 30 '23 09:11 rifkybujana

What models are you using?

akshay-anyscale avatar Dec 14 '23 19:12 akshay-anyscale

Llama 7b quantized with awq

rifkybujana avatar Jan 01 '24 06:01 rifkybujana