llm-api
llm-api copied to clipboard
Request: Lora support
Love this! Being able to spin up a local llm easily and interact with it over tried-and-true HTTP is a dream.
I frequently use vanilla llama with custom loras, and need the ability to load a model with a lora, and ideally, switch the lora and even load multiple loras in a given order.
My python is not the strongest - any chance of getting a feature like this added? I'm fairly certain we can take inspiration from text-generation-webui, which allows for loading a lora and changing lora while the model is loaded. (Afaik, TGWUI does not support loading more than one lora at a time yet.)
Thanks for the feedback, I've been looking how to add Lora support, would you mind sharing with me a snippet on how do you run your model?
And what do you exactly mean by loading more than one lora at a time?