OpenLLM Question about multi-adapter

Question about multi-adapter

Open KyrieCui opened this issue 2 years ago • 1 comments

I met a question when using multi-adapter. It works with loading different PEFT adapter and call it by the adapter_name/ adapter_id. However, can i call the Vanilla llm? For example, I deploy Llama2 with multi-adapters, can i disable adapters and using the original llam2 model to inference by the framework? Looking forward to u asap.

Nov 15 '23 04:11 KyrieCui

Currently, we yet to support unloading lora layers. This has to do with unloading models of the memory are pretty slow from what I have tested so far, when loading around 10-15 layers

Another approach is not to disable lora layers when loading model into memory, and load dynamically on request. Imagine in a distributed environment, there is no way to ensure that all model pods will load the adapter correctly.

I think for multi adapters, the ability to use the base model can be supported, but I think it is probably very low priority right now.

Nov 15 '23 05:11 aarnphm

OpenLLM OpenLLM copied to clipboard

Question about multi-adapter

OpenLLM
OpenLLM copied to clipboard