vllm Not able to used qlora models with vllm

Not able to used qlora models with vllm

Open royrajjyoti1 opened this issue 2 years ago • 1 comments

I have trained falcon 7b model with qlora but the inference time for outputs is too high.So I want to use vllm for increasing the inference time for that I have used a code snippet to load the model path
llm = LLM(model="/content/trained-model/"). But I am getting an error :

OSError: /content/trained-model/ does not appear to have a file named config.json. Checkout 
'https://huggingface.co//content/trained-model//None' for available files.

Jun 26 '23 06:06 royrajjyoti1

Thank you @zhuohan123 for the reply. Can you provide me the ETD for falcon model because I have checked 4 days before it will be up in few days.( [https://github.com/vllm-project/vllm/issues/195])

Jun 28 '23 10:06 royrajjyoti1

You just need to merge the model. Vllm doesn't support LoRA.

Sep 16 '23 02:09 ehartford

@ehartford but merging has to be done in higher precision. Doesnt that defeat the purpose of wanting to have the base weights in low precision to speed up inference?

Jan 23 '24 02:01 fabianlim

Closing in favour of the feature request #3225

Mar 08 '24 11:03 hmellor

vllm vllm copied to clipboard

Not able to used qlora models with vllm

vllm
vllm copied to clipboard