ray-llm
ray-llm copied to clipboard
Is there a way to increase the scaling up speed?
As the title suggest, and from what i've experienced, vllm is slower than TGI in term of loading model, is there a way to optimize it? As of right now, it takes 1-2 minutes to scale-up instances using AWS G5 instance.
What models are you using?
Llama 7b quantized with awq