LoRA `lora.Linear.weight` Parameters Change After Loading Checkpoint in Train Mode Leading to Inconsistent Evaluation Results

Issue Summary

While fine-tuning a model by substituting some nn.Linear layers with lora.Linear, I noticed that the evaluation results during training differ from those after loading a checkpoint. More specifically, performing a "load-infer-save" cycle on a checkpoint without conducting any training led to changes in the weight parameters of the lora.Linear layers. Other parameters such as bias and lora_A within lora.Linear did not exhibit this behavior.

Steps to Reproduce

Replace certain nn.Linear layers within the model with lora.Linear for fine-tuning.
Save the entire model state without differentiating between LoRA-specific parameters and pretrained model parameters.
Ensure the model is in train mode.
Load the saved checkpoint using load_state_dict.
Observe that the weight parameter of lora.Linear layers changes after loading, which leads to inconsistent evaluation outcomes.

Root Cause Analysis

The problem appears to occur because when load_state_dict is called while the model is in train mode, it alters the weight parameters of lora.Linear layers. This alteration might be related to the merging and unmerging processes of LoRA parameters with the corresponding pretrained parameters.

Solution Applied

To address this issue, switch the model to eval mode before invoking load_state_dict. This approach ensures that the weight parameters of lora.Linear layers remain stable both before and after loading. Moreover, switching between eval and train modes afterward does not result in anomalies.

Is this behavior expected? If so, it would be helpful to document this behavior or adjust the implementation to prevent confusion among other users.

The following script may help reproduce the issue.

def compare_model_weights(state_dict1, state_dict2):
    # Compare the differences between two state_dict objects 
    # (whether they have the same keys and the same values).
    keys1 = set(state_dict1.keys())
    keys2 = set(state_dict2.keys())

    missing_in_model1 = keys2 - keys1  # Keys present in model2 but not in model1
    missing_in_model2 = keys1 - keys2  # Keys present in model1 but not in model2

    all_match = True

    if missing_in_model1 or missing_in_model2:
        all_match = False
        print("State dict keys do not match.\n")

        if missing_in_model1:
            print(f"Keys missing in model1: {missing_in_model1}\n")

        if missing_in_model2:
            print(f"Keys missing in model2: {missing_in_model2}\n")
        
    common_keys = keys1.intersection(keys2)
    for key in common_keys:
        if not torch.allclose(state_dict1[key], state_dict2[key]):
            all_match = False
            print(f"Weight mismatch found at layer: {key}\n")
            print(f"Model 1 tensor: {state_dict1[key]}\n")
            print(f"Model 2 tensor: {state_dict2[key]}\n")
            print("-" * 80 + "\n")

    if all_match:
            print("All weights match.")
    return all_match


checkpoint_path = "..."
# This checkpoint contains all the weights of the model, 
# including those belonging to LoRA and those of the pre-trained model.
ckp = torch.load(checkpoint_path, map_location="cpu")

# The model contains layers of lora.Linear().
model = Model(...)
# Loading weights in training mode may lead to anomalies.
model.train()
model.load_state_dict(ckp, strict=True)
ckp2= model.state_dict()

# This is very strange. If I execute model.eval(), 
# ckp and ckp2 are different; if I remove it, they are the same.
model.eval()
compare_model_weights(ckp, ckp2)

Jan 17 '25 12:01 sunpihai-up

# ... As above ...
import copy
ckp_copy = copy.deepcopy(ckp)
ckp2_copy = copy.deepcopy(ckp2)

model.eval()
compare_model_weights(ckp_copy, ckp2_copy)

The above code now reports that ckp_copy and ckp2_copy are identical. This observation indicates that switching the model to eval mode triggers a parameter merging process that alters the original model weights. Consequently, this can result in the inference model weights differing from those obtained during training. This discrepancy might be due to saving and loading all parameters together, as opposed to the separate handling of LoRA parameters and pretrained parameters as demonstrated in the README examples.

Of course, as mentioned in the "Solution Applied" section of the issue, switching the model to eval mode before calling load_state_dict can prevent this problem. However, the underlying reason for why this works remains unclear.

Jan 17 '25 12:01 sunpihai-up

Sorry but could you kindly explain how to substitute certain nn.Linear layers with lora.Linear? Specifically, do you transfer the pretrained weights from the nn.Linear layers in the pretrained model to lora.Linear before replacing them? I would truly appreciate it if you could share your approach along with a code example

Feb 21 '25 10:02 hungtp2504