peft
peft copied to clipboard
supports learning the combination weights of pre-trained LoRA modules
Based on #1655
Adds a use_wlora config to LoraLayer
that allows learning the combination weights (i.e. `wlora_weights) of pre-trained LoRAs.
Thanks for the PR. For me to be able to review it, could you provide an example of how it should be used?
@BenjaminBossan Yes, for sure.
The code implements the learned composition in Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? (Asadi et al., 2024). More specifically, it learns the $v$ for the weighted sum of LoRA modules as follows.
\hat{\mathbf{W}} = \mathbf{W}_{base} + \sum_{n=1}^{N} \hat{v}_n \left( \frac{\alpha_n}{r_n} \mathbf{A}_n \mathbf{B}_n\right),
\sum_{n=1}^{N}\hat{v}_n=1,
where $\hat{v}$ is the softmax operation applied on the weighting vector $v$, i.e.,
\hat{v}_n=e^{v_n}/\left(\sum_{j=1}^{N}e^{v_j}\right)
We named the parameter $v$ as wlora_weights
in the model parameters.
Usage example
The following script is an example of how to load two pre-trained LoRA modules and learn the combination weights for LLMs.
First, we load the base model
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig
# Load base model
base_model = "facebook/opt-350m"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
Then, we add the two LoRAs and make their weight trainable.
# Add the first LoRA with learnable weight to the base model
lora_1 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_1_config = PeftConfig.from_pretrained(lora_1)
lora_1_config.use_wlora =True
model.add_adapter(adapter_config=lora_1_config, adapter_name='lora_1')
# Add the second LoRA
lora_2 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_2_config = PeftConfig.from_pretrained(lora_2)
lora_2_config.use_wlora =True
model.add_adapter(adapter_config=lora_2_config, adapter_name='lora_2')
# Activate LoRA modules as trainable
model.set_adapter(['lora_1', 'lora_2'])
Modules are successfully loaded and you can treat the model as any HuggingFace or torch.nn.Module
and use any training method. Following is an example of using the HuggingFace Trainer.
# Train the wights of LoRA modules
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir="wlora-model",
evaluation_strategy="epoch",
learning_rate=1e-4,
weight_decay=0.01,
push_to_hub=False,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=lm_dataset["train"],
eval_dataset=lm_dataset["test"],
data_collator=data_collator,
)
trainer.train()
Thanks for working on this PR and for providing an example. I also see now that you're one of the paper authors :)
I left a couple of comments on this PR. On top of that, we should probably also add a section to the docs (here) because it is not quite trivial to figure out for a user how to use this.
Moreover, I tried to come up with a test for this method. When I tried something based on the example you provided, I ran into an error though. Could you please check?
import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftConfig, LoraConfig, get_peft_model, PeftModel torch.manual_seed(0) base_model_id = "facebook/opt-125m" tokenizer = AutoTokenizer.from_pretrained(base_model_id) model = AutoModelForCausalLM.from_pretrained(base_model_id) config = LoraConfig(init_lora_weights=False, use_wlora=True) model = get_peft_model(model, config) model.add_adapter("other", config) model.base_model.set_adapter(['lora_1', 'lora_2']) optimizer = torch.optim.SGD(model.parameters(), lr=0.1) optimizer.zero_grad() output = model(torch.arange(10).reshape(-1, 1)) loss = output.logits.sum() loss.backward() # this causes: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Maybe this is related to the comment about how to set
requires_grad
, not sure.
Thanks for taking the time to read the PR. Yes, I am one of the authors. I was hoping to create an easy method for the community to combine pre-trained LoRAs :)
This is a great test script. Yes, there was an issue regarding the required grad and I changed the method as you mentioned. Also, model.base_model.set_adapter(['lora_1', 'lora_2'])
should contain the names of the layers so it should be :
model.base_model.set_adapter(['default', 'other'])
So right now using this update we can write the following scripts as test and example usage. I added the example usage to the docs.
Test
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, LoraConfig, get_peft_model, PeftModel
torch.manual_seed(0)
base_model_id = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
config = LoraConfig(init_lora_weights=False, use_wlora=True)
model = get_peft_model(model, config)
model.add_adapter("other", config)
model.base_model.set_adapter(['default', 'other'])
# Freeze lora_A and lora_B
for name, param in model.named_parameters():
if 'lora_A' in name or 'lora_B' in name:
param.requires_grad = False
# Print number of trainable parameters
print('n_trainable_parameters', model.get_nb_trainable_parameters()) # 12 (layers) * 2 (lora) * 2 (q,v)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer.zero_grad()
output = model(torch.arange(10).reshape(-1, 1))
loss = output.logits.sum()
loss.backward()
Example usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig
# Load base model
base_model = "facebook/opt-350m"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
# Add the first LoRA with learnable weight to the base model
lora_1 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_1_config = PeftConfig.from_pretrained(lora_1)
lora_1_config.use_wlora =True
model.add_adapter(adapter_config=lora_1_config, adapter_name='lora_1')
# Add the second LoRA
lora_2 = "varun-v-rao/opt-350m-lora-1.57M-squad-model3"
lora_2_config = PeftConfig.from_pretrained(lora_2)
lora_2_config.use_wlora =True
model.add_adapter(adapter_config=lora_2_config, adapter_name='lora_2')
# Activate LoRA modules as trainable
model.set_adapter(['lora_1', 'lora_2'])
# Freeze lora_A and lora_B
for name, param in model.named_parameters():
if 'lora_A' in name or 'lora_B' in name:
param.requires_grad = False
Here, I am using
model.base_model.set_adapter(['default', 'other'])
to activate the two modules and I am using
for name, param in model.named_parameters():
if 'lora_A' in name or 'lora_B' in name:
param.requires_grad = False
to freeze lora_B and lora_A layers to just keep the wlroa_weights
as trainable.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Hi @BenjaminBossan, sorry I was busy in the last three weeks. I will apply your comments and push it this week.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.