peft supports learning the combination weights of pre-trained LoRA modules

Based on #1655

Adds a use_wlora config to LoraLayer that allows learning the combination weights (i.e. `wlora_weights) of pre-trained LoRAs.

Apr 22 '24 01:04 mahdibeit

Thanks for the PR. For me to be able to review it, could you provide an example of how it should be used?

Apr 26 '24 09:04 BenjaminBossan

@BenjaminBossan Yes, for sure.

The code implements the learned composition in Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? (Asadi et al., 2024). More specifically, it learns the $v$ for the weighted sum of LoRA modules as follows.

\hat{\mathbf{W}} = \mathbf{W}_{base} + \sum_{n=1}^{N} \hat{v}_n \left( \frac{\alpha_n}{r_n} \mathbf{A}_n \mathbf{B}_n\right),

\sum_{n=1}^{N}\hat{v}_n=1,

where $\hat{v}$ is the softmax operation applied on the weighting vector $v$, i.e.,

\hat{v}_n=e^{v_n}/\left(\sum_{j=1}^{N}e^{v_j}\right)

We named the parameter $v$ as wlora_weights in the model parameters.

Usage example

The following script is an example of how to load two pre-trained LoRA modules and learn the combination weights for LLMs.

First, we load the base model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig

# Load base model
base_model = "facebook/opt-350m"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)

Then, we add the two LoRAs and make their weight trainable.

# Add the first LoRA with learnable weight to the base model
lora_1 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_1_config = PeftConfig.from_pretrained(lora_1)
lora_1_config.use_wlora =True
model.add_adapter(adapter_config=lora_1_config, adapter_name='lora_1')

# Add the second LoRA
lora_2 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_2_config = PeftConfig.from_pretrained(lora_2)
lora_2_config.use_wlora =True
model.add_adapter(adapter_config=lora_2_config, adapter_name='lora_2')

# Activate LoRA modules as trainable
model.set_adapter(['lora_1', 'lora_2'])

Modules are successfully loaded and you can treat the model as any HuggingFace or torch.nn.Module and use any training method. Following is an example of using the HuggingFace Trainer.

# Train the wights of LoRA modules
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="wlora-model",
    evaluation_strategy="epoch",
    learning_rate=1e-4,
    weight_decay=0.01,
    push_to_hub=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_dataset["train"],
    eval_dataset=lm_dataset["test"],
    data_collator=data_collator,
)

trainer.train()

Apr 26 '24 15:04 mahdibeit

Thanks for working on this PR and for providing an example. I also see now that you're one of the paper authors :)

I left a couple of comments on this PR. On top of that, we should probably also add a section to the docs (here) because it is not quite trivial to figure out for a user how to use this.

Moreover, I tried to come up with a test for this method. When I tried something based on the example you provided, I ran into an error though. Could you please check?
import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, LoraConfig, get_peft_model, PeftModel

torch.manual_seed(0)
base_model_id = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)

config = LoraConfig(init_lora_weights=False, use_wlora=True)
model = get_peft_model(model, config)
model.add_adapter("other", config)
model.base_model.set_adapter(['lora_1', 'lora_2'])

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer.zero_grad()
output = model(torch.arange(10).reshape(-1, 1))
loss = output.logits.sum()
loss.backward()
# this causes: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Maybe this is related to the comment about how to set requires_grad, not sure.

Thanks for taking the time to read the PR. Yes, I am one of the authors. I was hoping to create an easy method for the community to combine pre-trained LoRAs :)

This is a great test script. Yes, there was an issue regarding the required grad and I changed the method as you mentioned. Also, model.base_model.set_adapter(['lora_1', 'lora_2'])

should contain the names of the layers so it should be : model.base_model.set_adapter(['default', 'other'])

May 01 '24 00:05 mahdibeit

So right now using this update we can write the following scripts as test and example usage. I added the example usage to the docs.

Test

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, LoraConfig, get_peft_model, PeftModel

torch.manual_seed(0)
base_model_id = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)

config = LoraConfig(init_lora_weights=False, use_wlora=True)
model = get_peft_model(model, config)
model.add_adapter("other", config)
model.base_model.set_adapter(['default', 'other'])

# Freeze lora_A and lora_B
for name, param in model.named_parameters():
    if 'lora_A' in name or 'lora_B' in name:
        param.requires_grad = False
        
# Print number of trainable parameters 
print('n_trainable_parameters', model.get_nb_trainable_parameters()) # 12 (layers) * 2 (lora) * 2 (q,v)

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer.zero_grad()
output = model(torch.arange(10).reshape(-1, 1))
loss = output.logits.sum()
loss.backward()

Example usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig

# Load base model
base_model = "facebook/opt-350m"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)

# Add the first LoRA with learnable weight to the base model
lora_1 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_1_config = PeftConfig.from_pretrained(lora_1)
lora_1_config.use_wlora =True
model.add_adapter(adapter_config=lora_1_config, adapter_name='lora_1')

# Add the second LoRA
lora_2 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_2_config = PeftConfig.from_pretrained(lora_2)
lora_2_config.use_wlora =True
model.add_adapter(adapter_config=lora_2_config, adapter_name='lora_2')

# Activate LoRA modules as trainable
model.set_adapter(['lora_1', 'lora_2'])

# Freeze lora_A and lora_B
for name, param in model.named_parameters():
    if 'lora_A' in name or 'lora_B' in name:
        param.requires_grad = False

Here, I am using model.base_model.set_adapter(['default', 'other']) to activate the two modules and I am using

for name, param in model.named_parameters():
    if 'lora_A' in name or 'lora_B' in name:
        param.requires_grad = False

to freeze lora_B and lora_A layers to just keep the wlroa_weights as trainable.

May 01 '24 00:05 mahdibeit

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

May 26 '24 15:05 github-actions[bot]

Hi @BenjaminBossan, sorry I was busy in the last three weeks. I will apply your comments and push it this week.

May 26 '24 19:05 mahdibeit

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Jun 20 '24 15:06 github-actions[bot]

peft peft copied to clipboard

supports learning the combination weights of pre-trained LoRA modules

Usage example

Test

Example usage

peft
peft copied to clipboard