peft peft " target_modules='all-linear' " have different behavior between x86 and aarch ?

System Info

i have tested on 2 arch (x86, arm) then find this bug. both arch have peft==0.17.1

Who can help?

@benjaminbossan @githubnemo

Reproduction

Reproduction script : bug_reprod.py

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("OpenGVLab/InternVL3_5-1B-HF", trust_remote_code=True)
lm_head = model.lm_head
model = model.language_model
model.lm_head = lm_head

from peft import get_peft_model
from peft import LoraConfig

peft_config = LoraConfig(
    inference_mode=False, 
    r=12,
    target_modules="all-linear",
)
bug_model = get_peft_model(model, peft_config)
bug_model.print_trainable_parameters()
breakpoint()  # p bug_model, you will find lm_head have different results

put bug_reprod.py to x86 and aarch, run it you will find it have different results on lm_head! following figure show the error :

x86

aarch

Expected behavior

target_module='all-linear' should exclude lm_head in lora tuning. At least, x86, arm arch should have identical behavior.

Oct 29 '25 03:10 HuangChiEn

Hmm, that's a very strange problem. Unfortunately, I don't have an ARM machine to test this one, what exactly are you using?

The logic to determine what layers to exclude is found here in PEFT:

https://github.com/huggingface/peft/blob/5d58b515672d31c57cc61a4c50a6635df645ec20/src/peft/tuners/tuners_utils.py#L1782-L1807

I really don't see how the architecture can influence this. What you could try to help debugging is to set a breakpoint in this part of PEFT and see how the code execution differs between the two architectures.

If you just want a solution to your issue, the most robust way would be to set target_modules=[...] where you list all layers you want to target explicitly.

Oct 29 '25 10:10 BenjaminBossan

Hmm, that's a very strange problem. Unfortunately, I don't have an ARM machine to test this one, what exactly are you using?

The logic to determine what layers to exclude is found here in PEFT:

peft/src/peft/tuners/tuners_utils.py

Lines 1782 to 1807 in 5d58b51

Try to remove linear layers that should not be targeted as best as possible. We have to rely on convention as

there are no hard rules to detect these modules.

module_names_to_exclude = set() if isinstance(model, PreTrainedModel): output_emb = model.get_output_embeddings() if output_emb is not None: # ignore the last classification head for text generation models last_module_name = [name for name, module in model.named_modules() if module is output_emb][0] module_names_to_exclude.add(last_module_name) elif peft_config.task_type == TaskType.SEQ_CLS: # ignore classifier head for classification models (issue 2027) # there is no fix name for the classifier head, so check the common ones for name in SEQ_CLS_HEAD_NAMES: cls_head = getattr(model, name, None) if cls_head is not None: last_module_name = [name for name, module in model.named_modules() if module is cls_head][0] module_names_to_exclude.add(last_module_name) break

we don't want nested LoRA layers, i.e. LoRA being applied to possibly existing lora_A, lora_B, etc.

see 2390

for prefix, module in model.named_modules(): if isinstance(module, BaseTunerLayer): for suffix, child in module.named_modules(): if suffix: module_names_to_exclude.add(f"{prefix}.{suffix}") I really don't see how the architecture can influence this. What you could try to help debugging is to set a breakpoint in this part of PEFT and see how the code execution differs between the two architectures.

If you just want a solution to your issue, the most robust way would be to set target_modules=[...] where you list all layers you want to target explicitly.

yes, indeed it's strange...orz TKS for your quick reply, i may set breakpoint to inspect it ~

may provide some info here later

Oct 30 '25 00:10 HuangChiEn

Hi @BenjaminBossan,

Jumping in here — and just to clarify for you @HuangChiEn, I’m not a maintainer, just another contributor like you. I went through the discussion between you both and figured I’d test this out on ARM as well. While I also don’t see why the architecture would cause any behavioural difference, I tried reproducing the issue on my device but couldn’t.

i wanted to be sure that nothing was amiss , so i also wrote another very rough validator to show everything else :

import platform
import sys
import torch

print("="*80)
print("EXHAUSTIVE DEBUGGING - ALL ANGLES")
print("="*80)
print(f"Architecture: {platform.machine()}")
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")

import peft
import transformers
print(f"PEFT: {peft.__version__}")
print(f"Transformers: {transformers.__version__}")

from transformers import AutoModelForImageTextToText
from peft import LoraConfig, get_peft_model

print("\n" + "="*80)
print("LOADING MODEL - EXACT REPRODUCTION")
print("="*80)

model = AutoModelForImageTextToText.from_pretrained("OpenGVLab/InternVL3_5-1B-HF", trust_remote_code=True)
lm_head = model.lm_head
model = model.language_model
model.lm_head = lm_head

print(f"\n1. Model structure:")
print(f"   model.__class__: {model.__class__}")
print(f"   model type: {type(model)}")

print(f"\n2. lm_head analysis:")
print(f"   lm_head type: {type(model.lm_head)}")
print(f"   lm_head id: {id(model.lm_head)}")
print(f"   lm_head device: {model.lm_head.weight.device}")

print(f"\n3. get_output_embeddings() analysis:")
output_emb = model.get_output_embeddings()
print(f"   output_emb: {output_emb}")
print(f"   output_emb type: {type(output_emb)}")
print(f"   output_emb id: {id(output_emb)}")
if output_emb is not None and hasattr(output_emb, 'weight'):
    print(f"   output_emb device: {output_emb.weight.device}")

print(f"\n4. Identity check:")
print(f"   output_emb is model.lm_head: {output_emb is model.lm_head}")
print(f"   output_emb == model.lm_head: {output_emb == model.lm_head}")
print(f"   id(output_emb) == id(model.lm_head): {id(output_emb) == id(model.lm_head)}")

print(f"\n5. Module registration:")
print(f"   'lm_head' in model._modules: {'lm_head' in model._modules}")
print(f"   'lm_head' in dir(model): {'lm_head' in dir(model)}")

print(f"\n6. named_modules() iteration:")
all_names = []
lm_head_matches = []
output_emb_matches = []

for name, module in model.named_modules():
    all_names.append(name)
    if 'lm_head' in name or name == 'lm_head':
        lm_head_matches.append((name, type(module).__name__, id(module)))
    if module is output_emb:
        output_emb_matches.append((name, type(module).__name__, id(module)))

print(f"   Total modules: {len(all_names)}")
print(f"   Modules with 'lm_head': {len(lm_head_matches)}")
for name, typ, mid in lm_head_matches:
    print(f"      - '{name}': {typ} (id={mid})")
print(f"   Modules matching output_emb by identity: {len(output_emb_matches)}")
for name, typ, mid in output_emb_matches:
    print(f"      - '{name}': {typ} (id={mid})")

print(f"\n7. Linear modules count:")
import torch.nn as nn
linear_count = sum(1 for name, module in model.named_modules() if isinstance(module, nn.Linear))
print(f"   Total Linear modules: {linear_count}")

print("\n" + "="*80)
print("TESTING EXCLUSION LOGIC MANUALLY")
print("="*80)

# Manually run the exclusion logic
from transformers import PreTrainedModel

linear_module_names = set()
for name, module in model.named_modules():
    if isinstance(module, nn.Linear):
        linear_module_names.add(name)

module_names_to_exclude = set()
if isinstance(model, PreTrainedModel):
    output_emb = model.get_output_embeddings()
    if output_emb is not None:
        # This is the EXACT code from PEFT
        matches = [name for name, module in model.named_modules() if module is output_emb]
        print(f"Identity matches for output_emb: {matches}")
        if matches:
            last_module_name = matches[0]
            module_names_to_exclude.add(last_module_name)
            print(f"Will exclude: '{last_module_name}'")

print(f"\nLinear modules before exclusion: {len(linear_module_names)}")
print(f"Modules to exclude: {module_names_to_exclude}")
linear_module_names -= module_names_to_exclude
print(f"Linear modules after exclusion: {len(linear_module_names)}")
print(f"'lm_head' in excluded: {'lm_head' in module_names_to_exclude}")

print("\n" + "="*80)
print("APPLYING PEFT")
print("="*80)

peft_config = LoraConfig(
    inference_mode=False,
    r=12,
    target_modules="all-linear",
)
bug_model = get_peft_model(model, peft_config)

bug_model.print_trainable_parameters()

print("\n" + "="*80)
print("POST-PEFT ANALYSIS")
print("="*80)

print(f"bug_model.lm_head type: {type(bug_model.lm_head)}")
print(f"bug_model.lm_head: {bug_model.lm_head}")

# Check the string representation
lm_head_str = str(bug_model.lm_head)
print(f"\nString representation length: {len(lm_head_str)} chars")
print(f"First 500 chars:\n{lm_head_str[:500]}")

# Detailed check for LoRA
has_lora_in_str = 'lora' in lm_head_str.lower()
has_lora_in_type = 'lora' in str(type(bug_model.lm_head)).lower()
has_lora_attr = hasattr(bug_model.lm_head, 'lora_A')

print(f"\nLoRA detection:")
print(f"   'lora' in str(lm_head): {has_lora_in_str}")
print(f"   'lora' in type(lm_head): {has_lora_in_type}")
print(f"   hasattr(lm_head, 'lora_A'): {has_lora_attr}")

print("\n" + "="*80)
print("FINAL VERDICT")
print("="*80)

if has_lora_in_str or has_lora_in_type or has_lora_attr:
    print("❌ BUG REPRODUCED: lm_head has LoRA layers!")
    print(f"   Architecture: {platform.machine()}")
    print(f"   PEFT version: {peft.__version__}")
else:
    print("✅ Bug NOT reproduced: lm_head correctly excluded")
    print(f"   Architecture: {platform.machine()}")
    print(f"   PEFT version: {peft.__version__}")

print("\n" + "="*80)
print("ALL MODULES IN PEFT MODEL (first 50):")
print("="*80)
for i, (name, module) in enumerate(bug_model.named_modules()):
    if i < 50:
        print(f"   {name}: {type(module).__name__}")

and the output for that was :

================================================================================
EXHAUSTIVE DEBUGGING - ALL ANGLES
================================================================================
Architecture: arm64
Python: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 12:55:12) [Clang 14.0.6 ]
PyTorch: 2.9.0
PEFT: 0.17.1
Transformers: 4.57.1

================================================================================
LOADING MODEL - EXACT REPRODUCTION
================================================================================

1. Model structure:
   model.__class__: <class 'transformers.models.qwen3.modeling_qwen3.Qwen3Model'>
   model type: <class 'transformers.models.qwen3.modeling_qwen3.Qwen3Model'>

2. lm_head analysis:
   lm_head type: <class 'torch.nn.modules.linear.Linear'>
   lm_head id: 4729322864
   lm_head device: cpu

3. get_output_embeddings() analysis:
   output_emb: Linear(in_features=1024, out_features=151936, bias=False)
   output_emb type: <class 'torch.nn.modules.linear.Linear'>
   output_emb id: 4729322864
   output_emb device: cpu

4. Identity check:
   output_emb is model.lm_head: True
   output_emb == model.lm_head: True
   id(output_emb) == id(model.lm_head): True

5. Module registration:
   'lm_head' in model._modules: True
   'lm_head' in dir(model): True

6. named_modules() iteration:
   Total modules: 426
   Modules with 'lm_head': 1
      - 'lm_head': Linear (id=4729322864)
   Modules matching output_emb by identity: 1
      - 'lm_head': Linear (id=4729322864)

7. Linear modules count:
   Total Linear modules: 197

================================================================================
TESTING EXCLUSION LOGIC MANUALLY
================================================================================
Identity matches for output_emb: ['lm_head']
Will exclude: 'lm_head'

Linear modules before exclusion: 197
Modules to exclude: {'lm_head'}
Linear modules after exclusion: 196
'lm_head' in excluded: True

================================================================================
APPLYING PEFT
================================================================================
trainable params: 7,569,408 || all params: 759,201,792 || trainable%: 0.9970

================================================================================
POST-PEFT ANALYSIS
================================================================================
bug_model.lm_head type: <class 'torch.nn.modules.linear.Linear'>
bug_model.lm_head: Linear(in_features=1024, out_features=151936, bias=False)

String representation length: 57 chars
First 500 chars:
Linear(in_features=1024, out_features=151936, bias=False)

LoRA detection:
   'lora' in str(lm_head): False
   'lora' in type(lm_head): False
   hasattr(lm_head, 'lora_A'): False

================================================================================
FINAL VERDICT
================================================================================
✅ Bug NOT reproduced: lm_head correctly excluded
   Architecture: arm64
   PEFT version: 0.17.1

================================================================================
ALL MODULES IN PEFT MODEL (first 50):
================================================================================
   : PeftModel
   base_model: LoraModel
   base_model.model: Qwen3Model
   base_model.model.embed_tokens: Embedding
   base_model.model.layers: ModuleList
   base_model.model.layers.0: Qwen3DecoderLayer
   base_model.model.layers.0.self_attn: Qwen3Attention
   base_model.model.layers.0.self_attn.q_proj: Linear
   base_model.model.layers.0.self_attn.q_proj.base_layer: Linear
   base_model.model.layers.0.self_attn.q_proj.lora_dropout: ModuleDict
   base_model.model.layers.0.self_attn.q_proj.lora_dropout.default: Identity
   base_model.model.layers.0.self_attn.q_proj.lora_A: ModuleDict
   base_model.model.layers.0.self_attn.q_proj.lora_A.default: Linear
   base_model.model.layers.0.self_attn.q_proj.lora_B: ModuleDict
   base_model.model.layers.0.self_attn.q_proj.lora_B.default: Linear
   base_model.model.layers.0.self_attn.q_proj.lora_embedding_A: ParameterDict
   base_model.model.layers.0.self_attn.q_proj.lora_embedding_B: ParameterDict
   base_model.model.layers.0.self_attn.q_proj.lora_magnitude_vector: ModuleDict
   base_model.model.layers.0.self_attn.k_proj: Linear
   base_model.model.layers.0.self_attn.k_proj.base_layer: Linear
   base_model.model.layers.0.self_attn.k_proj.lora_dropout: ModuleDict
   base_model.model.layers.0.self_attn.k_proj.lora_dropout.default: Identity
   base_model.model.layers.0.self_attn.k_proj.lora_A: ModuleDict
   base_model.model.layers.0.self_attn.k_proj.lora_A.default: Linear
   base_model.model.layers.0.self_attn.k_proj.lora_B: ModuleDict
   base_model.model.layers.0.self_attn.k_proj.lora_B.default: Linear
   base_model.model.layers.0.self_attn.k_proj.lora_embedding_A: ParameterDict
   base_model.model.layers.0.self_attn.k_proj.lora_embedding_B: ParameterDict
   base_model.model.layers.0.self_attn.k_proj.lora_magnitude_vector: ModuleDict
   base_model.model.layers.0.self_attn.v_proj: Linear
   base_model.model.layers.0.self_attn.v_proj.base_layer: Linear
   base_model.model.layers.0.self_attn.v_proj.lora_dropout: ModuleDict
   base_model.model.layers.0.self_attn.v_proj.lora_dropout.default: Identity
   base_model.model.layers.0.self_attn.v_proj.lora_A: ModuleDict
   base_model.model.layers.0.self_attn.v_proj.lora_A.default: Linear
   base_model.model.layers.0.self_attn.v_proj.lora_B: ModuleDict
   base_model.model.layers.0.self_attn.v_proj.lora_B.default: Linear
   base_model.model.layers.0.self_attn.v_proj.lora_embedding_A: ParameterDict
   base_model.model.layers.0.self_attn.v_proj.lora_embedding_B: ParameterDict
   base_model.model.layers.0.self_attn.v_proj.lora_magnitude_vector: ModuleDict
   base_model.model.layers.0.self_attn.o_proj: Linear
   base_model.model.layers.0.self_attn.o_proj.base_layer: Linear
   base_model.model.layers.0.self_attn.o_proj.lora_dropout: ModuleDict
   base_model.model.layers.0.self_attn.o_proj.lora_dropout.default: Identity
   base_model.model.layers.0.self_attn.o_proj.lora_A: ModuleDict
   base_model.model.layers.0.self_attn.o_proj.lora_A.default: Linear
   base_model.model.layers.0.self_attn.o_proj.lora_B: ModuleDict
   base_model.model.layers.0.self_attn.o_proj.lora_B.default: Linear
   base_model.model.layers.0.self_attn.o_proj.lora_embedding_A: ParameterDict
   base_model.model.layers.0.self_attn.o_proj.lora_embedding_B: ParameterDict

Hence , i was also neither able to reproduce the error nor find any trialing hint of its probable cause here . @HuangChiEn i think you could perhaps look to run this script as well and verify if there are any significant differences between my findings and yours in any of these parameters . perhaps that could lead to some valuable information here .

Oct 30 '25 18:10 sambhavnoobcoder

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Nov 28 '25 15:11 github-actions[bot]