transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Mamba + LoRA: after merge_and_unload() does not work well

Open javiermcebrian opened this issue 10 months ago • 10 comments

System Info

python: 3.10.13 gpu: yes transformers==4.39.3 accelerate==0.29.0 peft==0.10.0

Who can help?

@arthur

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

TRAIN

pretrained_model_name = "state-spaces/mamba-370m-hf"
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name)
# LoRA config
config_lora = LoraConfig(
    r=64,
    lora_alpha=64,
    lora_dropout=0.01,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
    target_modules=["x_proj", "in_proj", "out_proj", "dt_proj", "lm_head"],
)
# LoRA model
model = get_peft_model(model, config_lora)
model.print_trainable_parameters()
# Model to device
model.to(device)
# Then: set Trainer and train()

INFERENCE

model_path = experiment_path / "model" / ckpt
pretrained_model_name = "state-spaces/mamba-370m-hf"
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name)
model = PeftModel.from_pretrained(model, model_path, is_trainable=False)
print(model)
model = model.merge_and_unload(progressbar=True)
print(model)
model.to(device)
model = torch.compile(model)

Expected behavior

Without doing merge_and_unload() it seems to work, while doing merge_and_unload() printing the model gives no LoRA layer as well as generations are quite bad (as if we are using just the original base model without any fine tunning). It seems an issue with Mamba model being used with LoRA. Thanks!

javiermcebrian avatar Apr 18 '24 12:04 javiermcebrian

cc @younesbelkada

amyeroberts avatar Apr 18 '24 12:04 amyeroberts

cc @Aniketh999

Aniketh999 avatar Apr 18 '24 15:04 Aniketh999

I'm having the same issue here:

Original model: https://huggingface.co/dominguesm/mambarim-110m Adapter: https://huggingface.co/dominguesm/mambarim-110m-chat

When I run the merge_and_unload function, the model simply loses the fine-tuning.

DominguesM avatar May 02 '24 21:05 DominguesM

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 01 '24 08:06 github-actions[bot]

Hi! @amyeroberts, any news on this one? :) Thanks!

javiermcebrian avatar Jun 01 '24 09:06 javiermcebrian

Hi. @javiermcebrian - apologies for the delay, I am on it !

younesbelkada avatar Jun 03 '24 08:06 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 28 '24 08:06 github-actions[bot]

hi @younesbelkada! The issue is marked as stale, hos it it going? :) thanks!

javiermcebrian avatar Jun 28 '24 08:06 javiermcebrian

@javiermcebrian Sadly @younesbelkada isn't at HF anymore :( Perhaps @BenjaminBossan for PEFT or @ArthurZucker for Mamba will know the state of this?

amyeroberts avatar Jun 28 '24 11:06 amyeroberts

Thanks for the ping. The issue is that PEFT merges the LoRA weights into the lm_head, since you added it to target_modules. However, the weight of the LM head are tied to the embedding weights. Therefore, those are mutated too after the merge, which results in wrong outputs.

To remedy this, I would suggest not to target the LM head with LoRA. If you need to train it, you can fully fine-tune it by passing modules_to_save=["lm_head"] to LoraConfig. Alternatively, you can untie the two weights.

BenjaminBossan avatar Jun 28 '24 12:06 BenjaminBossan

Ok thanks!

BTW, I suppose that if with LoRA I avoid targetting LM head but still target embeddings, after the merge_and_unload() the LM head will get mutated again resulting in wrong results as the weights got trained assuming frozen LM head, right? And similar thing would happen to the full fine tune of LM head using modules to save, right?

Then, the only possibilities I see are: 1) untie the original weights LM head and embeddings 2) tie the LoRA weights LM head and embeddings. Am I right?

javiermcebrian avatar Jul 01 '24 10:07 javiermcebrian

I suppose that if with LoRA I avoid targetting LM head but still target embeddings, after the merge_and_unload() the LM head will get mutated again resulting in wrong results as the weights got trained assuming frozen LM head, right?

Right. You could store a copy of the LM head and override the LM head after the merge using that copy.

And similar thing would happen to the full fine tune of LM head using modules to save, right?

No, modules_to_save creates a new copy of the targeted module, so there is no issue with tied weights. Only if you were to switch back to the original LM head (e.g. when you disable LoRA) would you still see the issue.

Then, the only possibilities I see are: 1) untie the original weights LM head and embeddings 2) tie the LoRA weights LM head and embeddings. Am I right?

Apart from what I said above, this should also work, yes.

BenjaminBossan avatar Jul 01 '24 11:07 BenjaminBossan

thanks!

javiermcebrian avatar Jul 01 '24 11:07 javiermcebrian