optimum-intel icon indicating copy to clipboard operation
optimum-intel copied to clipboard

How to apply LoRA adapter to a model loaded with OVModelForCausalLM()?

Open nai-kon opened this issue 10 months ago • 4 comments

In the transformers library, we can load multiple adapters to the original model by load_adapter then switch the specified adapter with set_adapter like below.

# base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
)

# load multiple adapters
model.load_adapter("model/adapter1/", "adapter1")
model.load_adapter("model/adapter2/", "adapter2")

# switch adapter
model.set_adapter("adapter2")

Now I want to apply LoRA adapters with OpenVINO, but I can't find an example of it. Is it possible to do it with OVModelForCausalLM?

nai-kon avatar Mar 29 '24 01:03 nai-kon

You probably can't do that once the model is loaded/exported to OpenVINO (or any framework with static computation graph), what you probably can do is to have one or many adapters fused into a base model and export it to OpenVINO.

  • Load the model using AutoModelForCausalLM
  • Load you target adapters in the way you would usually do
  • Create PeftModel and use its merge_and_unload method to fuse the adapters into the base model
  • Save the fused model locally or push to the Hub.
  • Load it with OVModelForCausalLM

IlyasMoutawwakil avatar Apr 10 '24 08:04 IlyasMoutawwakil

Thank you for your detailed answer. I understood that dynamic switching is difficult with static graph frameworks such as OpenVINO.

nai-kon avatar Apr 14 '24 23:04 nai-kon

Additional question. I understood that multiple adapters can be merged into the model using merge_and_unload(), but is it possible to load a model contains multiple adapters with OVModelForCausalLM and change the adapter? Or do I need to merge one adapter into one model, so if there are three adapters, do I need to prepare three merged models? If so, my concern is that the file size of the model will increase in proportion to the number of adapters.

nai-kon avatar Apr 15 '24 00:04 nai-kon

for now it seems impossible to me, but my understanding of the OpenVINO runtime is still very narrow. This issue https://github.com/openvinotoolkit/openvino/issues/21806 requests an API that tries to do this (keep the same model, change weights between inference requests). Apparently, in theory, it's supposed to be possible using the StateAPI, will look into it.

IlyasMoutawwakil avatar Apr 25 '24 08:04 IlyasMoutawwakil

Thank you for the information!

nai-kon avatar Aug 03 '24 12:08 nai-kon