transformers
transformers copied to clipboard
Mamba + LoRA: after merge_and_unload() does not work well
System Info
python: 3.10.13 gpu: yes transformers==4.39.3 accelerate==0.29.0 peft==0.10.0
Who can help?
@arthur
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
TRAIN
pretrained_model_name = "state-spaces/mamba-370m-hf"
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name)
# LoRA config
config_lora = LoraConfig(
r=64,
lora_alpha=64,
lora_dropout=0.01,
bias="none",
task_type=TaskType.CAUSAL_LM,
target_modules=["x_proj", "in_proj", "out_proj", "dt_proj", "lm_head"],
)
# LoRA model
model = get_peft_model(model, config_lora)
model.print_trainable_parameters()
# Model to device
model.to(device)
# Then: set Trainer and train()
INFERENCE
model_path = experiment_path / "model" / ckpt
pretrained_model_name = "state-spaces/mamba-370m-hf"
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name)
model = PeftModel.from_pretrained(model, model_path, is_trainable=False)
print(model)
model = model.merge_and_unload(progressbar=True)
print(model)
model.to(device)
model = torch.compile(model)
Expected behavior
Without doing merge_and_unload() it seems to work, while doing merge_and_unload() printing the model gives no LoRA layer as well as generations are quite bad (as if we are using just the original base model without any fine tunning). It seems an issue with Mamba model being used with LoRA. Thanks!
cc @younesbelkada
cc @Aniketh999
I'm having the same issue here:
Original model: https://huggingface.co/dominguesm/mambarim-110m Adapter: https://huggingface.co/dominguesm/mambarim-110m-chat
When I run the merge_and_unload
function, the model simply loses the fine-tuning.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi! @amyeroberts, any news on this one? :) Thanks!
Hi. @javiermcebrian - apologies for the delay, I am on it !
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
hi @younesbelkada! The issue is marked as stale, hos it it going? :) thanks!
@javiermcebrian Sadly @younesbelkada isn't at HF anymore :( Perhaps @BenjaminBossan for PEFT or @ArthurZucker for Mamba will know the state of this?
Thanks for the ping. The issue is that PEFT merges the LoRA weights into the lm_head
, since you added it to target_modules
. However, the weight of the LM head are tied to the embedding weights. Therefore, those are mutated too after the merge, which results in wrong outputs.
To remedy this, I would suggest not to target the LM head with LoRA. If you need to train it, you can fully fine-tune it by passing modules_to_save=["lm_head"]
to LoraConfig
. Alternatively, you can untie the two weights.
Ok thanks!
BTW, I suppose that if with LoRA I avoid targetting LM head but still target embeddings, after the merge_and_unload() the LM head will get mutated again resulting in wrong results as the weights got trained assuming frozen LM head, right? And similar thing would happen to the full fine tune of LM head using modules to save, right?
Then, the only possibilities I see are: 1) untie the original weights LM head and embeddings 2) tie the LoRA weights LM head and embeddings. Am I right?
I suppose that if with LoRA I avoid targetting LM head but still target embeddings, after the merge_and_unload() the LM head will get mutated again resulting in wrong results as the weights got trained assuming frozen LM head, right?
Right. You could store a copy of the LM head and override the LM head after the merge using that copy.
And similar thing would happen to the full fine tune of LM head using modules to save, right?
No, modules_to_save
creates a new copy of the targeted module, so there is no issue with tied weights. Only if you were to switch back to the original LM head (e.g. when you disable LoRA) would you still see the issue.
Then, the only possibilities I see are: 1) untie the original weights LM head and embeddings 2) tie the LoRA weights LM head and embeddings. Am I right?
Apart from what I said above, this should also work, yes.
thanks!