bugfix: propage weight key_mapping to peft to fix 3.52 VLM renaming
What does this PR do?
Fixes behavior introduced in 4.52 VLM refacto #37033 that renames VLM model weights but forgot about adapters. (Theoretically, it seems the problem was just the non-propagation of keymapping to peft but this is made a lot more salient by recent big changes in VLM naming, which deprecates all existing adapters basically)
@zucchini-nlp
Note that this is just for loading, not sure if we want to generalize this to saving... I kind of feel we might just want people to save new adapters in the correct format going forward (and slowly convert all models to the new format while ensuring backcompat)
I mean this is not even PEFT, this code works without the peft library. Very intuitively, it just changes the names of the modules the adapters attach to, matching the code you wrote to rename the params in the base model !
@BenjaminBossan yes, exactly correct ! I made the requested changes.
I'll let @zucchini-nlp implement the test, I'm on vacation until mid june and as you say, not sure how to best process.
As for your interrogation, this is exactly my case. I maintain the colpali repository, which is largely based on inheriting from VLMs, and often uses PEFT. All existing checkpoints are now incompatible with the new release. I am able to remap on my end with the key_mapping argument but overriding the from_pretrained behaviour to add this kwarg, but this did not propagate to PEFT adapters.
The code I wrote here should enable all PEFT checkpoints that are linked to models in the VLMs list (explicit mapping dictionary and remapping is specified) to function. I'm not sure you need to do anything on the PEFT side in your case. However, the question of saving remains, do we save in the new or the old format...
BTW @zucchini-nlp, the verifications that are made for the video preprocessor resolution also don't consider the case of adapters that don't have a config.json in the repo itself (but that's a different issue altogether).
I felt fixing the loading was the most urgent in all cases because thousands of models probably have become obsolete.
1 failed becauseAttributeError -> 'NoneType' object has no attribute 'copy' Number of failed tests: 1
Reverting back to the way it was
The code I wrote here should enable all PEFT checkpoints that are linked to models in the VLMs list (explicit mapping dictionary and remapping is specified) to function. I'm not sure you need to do anything on the PEFT side in your case.
Hmm, but PEFT itself does not go through transformers model.load_adapter so I'm unsure. Could you give an example of a (ideally small) model that was affected for you so that I can test it in PEFT.
However, the question of saving remains, do we save in the new or the old format...
Since PEFT will be applied on top of the new architecture, the PEFT checkpoint itself will also assume the new architecture. IIUC, this would mean that if a user saves a PEFT checkpoint with the new architecture, and another user with an older transformers version from before #37033 tries to load it, it will fail (no forward compatibility). Is that right @zucchini-nlp?
Reverting back to the way it was
Ah yes sorry, the copy would need to be conditioned on it being a dict.
IDK PEFT well so you're surely right.
The mapping regexes make it so both ways of saving should enable loading it back up correctly (verifying the beginning of lines etc), so should be okay on that end.
I'm doing it with colqwen2-v1.0 personally, but I'm guessing some generative qwen2-vl fine-tunes should be affected, lemme check !
Looks like that breaks more things than we expected. My expectation was that almost all external libraries that use transformers model classes would load ckpt with from_pretrained(). Ideally if the users loads and saved a checkpoint with transformers, then it should be fully BC. The checkpoint is mapped to new keys when loading and remapped back to old keys before saving
IIUC, this would mean that if a user saves a PEFT checkpoint with the new architecture, and another user with an older transformers version from before https://github.com/huggingface/transformers/pull/37033 tries to load it, it will fail (no forward compatibility). Is that right @zucchini-nlp?
This one therefore should not be a problem, if we ignore the adapters for now. I believe the solution in this PR which re-maps keys to their new style would make it work for PEFT in the same way as in bare transformers.
Regarding the PEFT library, I am neither sure if you need any changes since I am not an expert. We can look at it together and check
@BenjaminBossan Here's a minimal example with a model with a noin-moddified architecture (and a peft adapter)
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained(
"lightonai/MonoQwen2-VL-v0.1",
device_map="cuda",
# attn_implementation="flash_attention_2",
# torch_dtype=torch.bfloat16,
)
Loading adapter weights from lightonai/MonoQwen2-VL-v0.1 led to unexpected keys not found in the model: model.layers.0.mlp.down_proj.lora_A.default.weight, model.layers.0.mlp.down_proj.lora_B.default.weight, model.layers.0.mlp.gate_proj.lora_A.default.weight, model.layers.0.mlp.gate_proj.lora_B.default.weight, model.layers.0.mlp.up_proj.lora_A.default.weight, model.layers.0.mlp.up_proj.lora_B.default.weight, ... model.language_model.layers.27.mlp.down_proj.lora_A.default.weight, model.language_model.layers.27.mlp.down_proj.lora_B.default.weight
Thanks for the pointer. I ran my own tests to check all directions, using transformers v4.49.0 as the old state, which should be from before the mentioned PR, and transformers main (8c59cdb3f81f9f4b8e7b8d05b92d40c2e9413788) as the new state. This is the script I used:
import os
from transformers import Qwen2VLForConditionalGeneration
from peft import LoraConfig, PeftModel, get_peft_model
model_id = "Qwen/Qwen2-VL-2B-Instruct"
path = "/tmp/peft/38627"
# Load processor and model
model = Qwen2VLForConditionalGeneration.from_pretrained(model_id)
if not os.path.exists(path + "/adapter_model.safetensors"):
# Define LoRA configuration
lora_config = LoraConfig(target_modules=["q_proj", "v_proj"], init_lora_weights=False)
# Create PEFT model
peft_model = get_peft_model(model, lora_config)
peft_model.save_pretrained(path)
print("no existing PEFT model found, created and saved new one at", path)
else:
# Load existing PEFT model
peft_model = PeftModel.from_pretrained(model, path)
print("loaded existing PEFT model from", path)
Indeed, there are now issues when the version used to save the PEFT checkpoint and to load it are not identical:
| test | create version | load version | outcome |
|---|---|---|---|
| 1 | v4.49.0 | v4.49.0 | works |
| 2 | main | main | works |
| 3 | v4.49.0 | main | UserWarning: Found missing adapter keys while loading the checkpoint: ['base_model.model.model.language_model.layers.0.self_attn.q_proj.lora_A.default.weight', ...] |
| 4 | main | v4.49.0 | UserWarning: Found missing adapter keys while loading the checkpoint: ['base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight' ...] |
To explain why this happens: PEFT adds new modules on top of the base model and also wraps it. So if the base model has a structure like model.model.layers.0.self_attn.q_proj, it becomes base_model.model.model.model.layers.0.self_attn.q_proj. When save_pretrained is called on a PEFT model, only the PEFT part is stored, like base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight. Now, when the model structure changes, this won't fit anymore, resulting in the warnings shown above.
I think what we would need to do in PEFT is to apply the same mapping as in this PR to the PEFT checkpoint. I'm not sure if we have access to key_mapping inside of PeftModel.from_pretrained though, help would be appreciated.
Thanks for investigating !
In the main lib, conversion is done by hard coding a list of model for which we want to use their specified mapping attribute (_checkpoint_conversion_mapping). I'm guessing you could take the same list (called VLMs), and apply the mapping from the base model in the same way automatically ? Having a key_mapping argument might ofc also help non standard use cases.
This might be a bit independent from this PR though since it's in PEFT code ? The code I sent should be fixed by just merging this PR (backward compatibility at least)
I'm guessing you could take the same list (called VLMs), and apply the mapping from the base model in the same way automatically ?
Yes, this is true, I prototyped a patch that does that and it indeed works, making #37033 backwards compatible with older checkpoints. However, this still leaves the problem of using a checkpoint from after #37033 with a transformers version from before (forwards compatibility). This doesn't work as the (inverse) mapping does not exist in older transformers versions. I also don't think that there is a nice way to detect this so that we can instruct users to upgrade transformers, we could only try to make guesses. IMO this is not very nice, as I imagine this could be a real issue for users. @zucchini-nlp do you have an idea how to address this?
This might be a bit independent from this PR though since it's in PEFT code ?
Yes, sorry for spamming this PR. I can create a separate issue.
Very cool ! Haha no problem at all ofc, I wasn't sure actually since I didn't quite look at the extent of the scope of the "integrations.peft" transformers code. Thanks a ton for looking into all of this !
I created a PR to address this in PEFT: https://github.com/huggingface/peft/pull/2574. I used fake mini models to mimic the old and new model architectures for testing. The tests are still failing because of the issue I mentioned above.
Hello @zucchini-nlp , any news ?
Let's wait for a core maintainer's review, ping @Cyrilvallez :)
Thanks a lot for taking a look @Cyrilvallez ! Do you have any further comments @zucchini-nlp ?