vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Not able to used qlora models with vllm

Open royrajjyoti1 opened this issue 2 years ago • 1 comments

I have trained falcon 7b model with qlora but the inference time for outputs is too high.So I want to use vllm for increasing the inference time for that I have used a code snippet to load the model path
llm = LLM(model="/content/trained-model/"). But I am getting an error :

OSError: /content/trained-model/ does not appear to have a file named config.json. Checkout 
'https://huggingface.co//content/trained-model//None' for available files.

royrajjyoti1 avatar Jun 26 '23 06:06 royrajjyoti1

Thank you @zhuohan123 for the reply. Can you provide me the ETD for falcon model because I have checked 4 days before it will be up in few days.( [https://github.com/vllm-project/vllm/issues/195])

royrajjyoti1 avatar Jun 28 '23 10:06 royrajjyoti1

You just need to merge the model. Vllm doesn't support LoRA.

ehartford avatar Sep 16 '23 02:09 ehartford

@ehartford but merging has to be done in higher precision. Doesnt that defeat the purpose of wanting to have the base weights in low precision to speed up inference?

fabianlim avatar Jan 23 '24 02:01 fabianlim

Closing in favour of the feature request #3225

hmellor avatar Mar 08 '24 11:03 hmellor