peft
peft copied to clipboard
the output only hava adapter_config.json,adapter_model.bin
i use lora to fintune model , the final output do not have checkpoint ,only hava adapter_config.json,adapter_model.bin
it is right, only two files.
Hello, that is the expected output as the final checkpoints with PEFT methods are tiny. You can load them via PeftModel.from_pretrained(base_model, peft_model_name_or_path)
Hi @pacman100 , could you explain why the code is structured such that you must provide the base_model
? It seems to me that the base_model
is already present in the adapter_config.json
and thus we should be able to call PeftModel.from_pretrained(peft_model_name_or_path)
and the base_model
should be loaded internally. Ideally, we can even call AutoModel.from_pretrained(peft_model_name_or_path)
and the user is oblivious as to whether the underlying weights are coming from a standard or PEFT model.
Is there a way where one can save the base model and the adapter merged into one checkpoint?
I think I found my answer:
https://github.com/tloen/alpaca-lora/blob/main/export_state_dict_checkpoint.py
@clxyder, with the latest main branch, you can simply do model = model.merge_and_unload()
to get the base model with lora weights merged into it.
@clxyder, with the latest main branch, you can simply do
model = model.merge_and_unload()
to get the base model with lora weights merged into it.
Thank you for the answer! Is there a way I can update or set the state_dict without reloading it again?
What would it take to support GPT2 models in merge_and_unload?
Getting the error: "GPT2 models are not supported for merging LORA layers"
@clxyder, with the latest main branch, you can simply do
model = model.merge_and_unload()
to get the base model with lora weights merged into it.
@pacman100 @younesbelkada Hi, thanks for this cool library! I'm new to peft. The model.merge_and_unload()
method looks like magic to me. Could you give a quick introduction about model.merge_and_unload()
?
In my view, LoRA adds new trainable parameters/layers and inserts these layers into the base model, that is the LoRA model has additional structures on top of the base model. And we can save the merge_and_unload
model and reload it with base_model.from_pretrained(unloaded_model_path)
interface. But where are the additional layers and parameters?
Hi @Opdoop ,
Thanks for raising up this! In a nutshell this is a diagram that explains how the merging works under the hood:

So during training, you have these two independent modules (A
& B
) that are trainable. In that scenario, the output hidden states h
can be computed as:
h = (Wx + BAx) + b
During training as you only want to update A
& B
you can't simplify the mathematical expression and run the computation in a single matrix multiplication.
However once A
& B
have been trained, you can "merge" these weights by simply adding them to W
as follows`:
W_merged = (W + BA)
since
W_merged x + b = (Wx + BAx) + b
The merged model is totally equivalent as the un-merged model, but this time you only need a single weight, W_merged
Let us know if anything else is unclear
@younesbelkada Cool! A big thanks to you! This explanation solved my confusion in a very simple and concrete way. Thanks for this beautiful diagram and math explanation! 🌹🌹🌹
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.