vllm
vllm copied to clipboard
Not able to used qlora models with vllm
I have trained falcon 7b model with qlora but the inference time for outputs is too high.So I want to use vllm for increasing the inference time for that I have used a code snippet to load the model path
llm = LLM(model="/content/trained-model/").
But I am getting an error :
OSError: /content/trained-model/ does not appear to have a file named config.json. Checkout
'https://huggingface.co//content/trained-model//None' for available files.
Thank you @zhuohan123 for the reply. Can you provide me the ETD for falcon model because I have checked 4 days before it will be up in few days.( [https://github.com/vllm-project/vllm/issues/195])
You just need to merge the model. Vllm doesn't support LoRA.
@ehartford but merging has to be done in higher precision. Doesnt that defeat the purpose of wanting to have the base weights in low precision to speed up inference?
Closing in favour of the feature request #3225