transformers icon indicating copy to clipboard operation
transformers copied to clipboard

bugfix: propage weight key_mapping to peft to fix 3.52 VLM renaming

Open ManuelFay opened this issue 6 months ago • 15 comments

What does this PR do?

Fixes behavior introduced in 4.52 VLM refacto #37033 that renames VLM model weights but forgot about adapters. (Theoretically, it seems the problem was just the non-propagation of keymapping to peft but this is made a lot more salient by recent big changes in VLM naming, which deprecates all existing adapters basically)

@zucchini-nlp

Note that this is just for loading, not sure if we want to generalize this to saving... I kind of feel we might just want people to save new adapters in the correct format going forward (and slowly convert all models to the new format while ensuring backcompat)

ManuelFay avatar Jun 05 '25 23:06 ManuelFay

I mean this is not even PEFT, this code works without the peft library. Very intuitively, it just changes the names of the modules the adapters attach to, matching the code you wrote to rename the params in the base model !

ManuelFay avatar Jun 06 '25 08:06 ManuelFay

@BenjaminBossan yes, exactly correct ! I made the requested changes.

I'll let @zucchini-nlp implement the test, I'm on vacation until mid june and as you say, not sure how to best process.

As for your interrogation, this is exactly my case. I maintain the colpali repository, which is largely based on inheriting from VLMs, and often uses PEFT. All existing checkpoints are now incompatible with the new release. I am able to remap on my end with the key_mapping argument but overriding the from_pretrained behaviour to add this kwarg, but this did not propagate to PEFT adapters.

The code I wrote here should enable all PEFT checkpoints that are linked to models in the VLMs list (explicit mapping dictionary and remapping is specified) to function. I'm not sure you need to do anything on the PEFT side in your case. However, the question of saving remains, do we save in the new or the old format...

BTW @zucchini-nlp, the verifications that are made for the video preprocessor resolution also don't consider the case of adapters that don't have a config.json in the repo itself (but that's a different issue altogether).

I felt fixing the loading was the most urgent in all cases because thousands of models probably have become obsolete.

ManuelFay avatar Jun 06 '25 09:06 ManuelFay

1 failed becauseAttributeError -> 'NoneType' object has no attribute 'copy' Number of failed tests: 1

Reverting back to the way it was

ManuelFay avatar Jun 06 '25 09:06 ManuelFay

The code I wrote here should enable all PEFT checkpoints that are linked to models in the VLMs list (explicit mapping dictionary and remapping is specified) to function. I'm not sure you need to do anything on the PEFT side in your case.

Hmm, but PEFT itself does not go through transformers model.load_adapter so I'm unsure. Could you give an example of a (ideally small) model that was affected for you so that I can test it in PEFT.

However, the question of saving remains, do we save in the new or the old format...

Since PEFT will be applied on top of the new architecture, the PEFT checkpoint itself will also assume the new architecture. IIUC, this would mean that if a user saves a PEFT checkpoint with the new architecture, and another user with an older transformers version from before #37033 tries to load it, it will fail (no forward compatibility). Is that right @zucchini-nlp?

Reverting back to the way it was

Ah yes sorry, the copy would need to be conditioned on it being a dict.

BenjaminBossan avatar Jun 06 '25 10:06 BenjaminBossan

IDK PEFT well so you're surely right.

The mapping regexes make it so both ways of saving should enable loading it back up correctly (verifying the beginning of lines etc), so should be okay on that end.

I'm doing it with colqwen2-v1.0 personally, but I'm guessing some generative qwen2-vl fine-tunes should be affected, lemme check !

ManuelFay avatar Jun 06 '25 10:06 ManuelFay

Looks like that breaks more things than we expected. My expectation was that almost all external libraries that use transformers model classes would load ckpt with from_pretrained(). Ideally if the users loads and saved a checkpoint with transformers, then it should be fully BC. The checkpoint is mapped to new keys when loading and remapped back to old keys before saving

IIUC, this would mean that if a user saves a PEFT checkpoint with the new architecture, and another user with an older transformers version from before https://github.com/huggingface/transformers/pull/37033 tries to load it, it will fail (no forward compatibility). Is that right @zucchini-nlp?

This one therefore should not be a problem, if we ignore the adapters for now. I believe the solution in this PR which re-maps keys to their new style would make it work for PEFT in the same way as in bare transformers.

Regarding the PEFT library, I am neither sure if you need any changes since I am not an expert. We can look at it together and check

zucchini-nlp avatar Jun 06 '25 10:06 zucchini-nlp

@BenjaminBossan Here's a minimal example with a model with a noin-moddified architecture (and a peft adapter)

import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration

# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "lightonai/MonoQwen2-VL-v0.1",
    device_map="cuda",
    # attn_implementation="flash_attention_2",
    # torch_dtype=torch.bfloat16,
)

Loading adapter weights from lightonai/MonoQwen2-VL-v0.1 led to unexpected keys not found in the model: model.layers.0.mlp.down_proj.lora_A.default.weight, model.layers.0.mlp.down_proj.lora_B.default.weight, model.layers.0.mlp.gate_proj.lora_A.default.weight, model.layers.0.mlp.gate_proj.lora_B.default.weight, model.layers.0.mlp.up_proj.lora_A.default.weight, model.layers.0.mlp.up_proj.lora_B.default.weight, ... model.language_model.layers.27.mlp.down_proj.lora_A.default.weight, model.language_model.layers.27.mlp.down_proj.lora_B.default.weight

ManuelFay avatar Jun 06 '25 10:06 ManuelFay

Thanks for the pointer. I ran my own tests to check all directions, using transformers v4.49.0 as the old state, which should be from before the mentioned PR, and transformers main (8c59cdb3f81f9f4b8e7b8d05b92d40c2e9413788) as the new state. This is the script I used:

import os

from transformers import Qwen2VLForConditionalGeneration
from peft import LoraConfig, PeftModel, get_peft_model


model_id = "Qwen/Qwen2-VL-2B-Instruct"
path = "/tmp/peft/38627"

# Load processor and model
model = Qwen2VLForConditionalGeneration.from_pretrained(model_id)

if not os.path.exists(path + "/adapter_model.safetensors"):
    # Define LoRA configuration
    lora_config = LoraConfig(target_modules=["q_proj", "v_proj"], init_lora_weights=False)
    # Create PEFT model
    peft_model = get_peft_model(model, lora_config)
    peft_model.save_pretrained(path)
    print("no existing PEFT model found, created and saved new one at", path)
else:
    # Load existing PEFT model
    peft_model = PeftModel.from_pretrained(model, path)
    print("loaded existing PEFT model from", path)

Indeed, there are now issues when the version used to save the PEFT checkpoint and to load it are not identical:

test create version load version outcome
1 v4.49.0 v4.49.0 works
2 main main works
3 v4.49.0 main UserWarning: Found missing adapter keys while loading the checkpoint: ['base_model.model.model.language_model.layers.0.self_attn.q_proj.lora_A.default.weight', ...]
4 main v4.49.0 UserWarning: Found missing adapter keys while loading the checkpoint: ['base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight' ...]

To explain why this happens: PEFT adds new modules on top of the base model and also wraps it. So if the base model has a structure like model.model.layers.0.self_attn.q_proj, it becomes base_model.model.model.model.layers.0.self_attn.q_proj. When save_pretrained is called on a PEFT model, only the PEFT part is stored, like base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight. Now, when the model structure changes, this won't fit anymore, resulting in the warnings shown above.

I think what we would need to do in PEFT is to apply the same mapping as in this PR to the PEFT checkpoint. I'm not sure if we have access to key_mapping inside of PeftModel.from_pretrained though, help would be appreciated.

BenjaminBossan avatar Jun 06 '25 10:06 BenjaminBossan

Thanks for investigating !

In the main lib, conversion is done by hard coding a list of model for which we want to use their specified mapping attribute (_checkpoint_conversion_mapping). I'm guessing you could take the same list (called VLMs), and apply the mapping from the base model in the same way automatically ? Having a key_mapping argument might ofc also help non standard use cases.

This might be a bit independent from this PR though since it's in PEFT code ? The code I sent should be fixed by just merging this PR (backward compatibility at least)

ManuelFay avatar Jun 06 '25 11:06 ManuelFay

I'm guessing you could take the same list (called VLMs), and apply the mapping from the base model in the same way automatically ?

Yes, this is true, I prototyped a patch that does that and it indeed works, making #37033 backwards compatible with older checkpoints. However, this still leaves the problem of using a checkpoint from after #37033 with a transformers version from before (forwards compatibility). This doesn't work as the (inverse) mapping does not exist in older transformers versions. I also don't think that there is a nice way to detect this so that we can instruct users to upgrade transformers, we could only try to make guesses. IMO this is not very nice, as I imagine this could be a real issue for users. @zucchini-nlp do you have an idea how to address this?

This might be a bit independent from this PR though since it's in PEFT code ?

Yes, sorry for spamming this PR. I can create a separate issue.

BenjaminBossan avatar Jun 06 '25 12:06 BenjaminBossan

Very cool ! Haha no problem at all ofc, I wasn't sure actually since I didn't quite look at the extent of the scope of the "integrations.peft" transformers code. Thanks a ton for looking into all of this !

ManuelFay avatar Jun 06 '25 12:06 ManuelFay

I created a PR to address this in PEFT: https://github.com/huggingface/peft/pull/2574. I used fake mini models to mimic the old and new model architectures for testing. The tests are still failing because of the issue I mentioned above.

BenjaminBossan avatar Jun 06 '25 15:06 BenjaminBossan

Hello @zucchini-nlp , any news ?

ManuelFay avatar Jun 12 '25 05:06 ManuelFay

Let's wait for a core maintainer's review, ping @Cyrilvallez :)

zucchini-nlp avatar Jun 12 '25 06:06 zucchini-nlp

Thanks a lot for taking a look @Cyrilvallez ! Do you have any further comments @zucchini-nlp ?

ManuelFay avatar Jun 15 '25 15:06 ManuelFay