peft " target_modules='all-linear' " have different behavior between x86 and aarch ?
System Info
i have tested on 2 arch (x86, arm) then find this bug. both arch have peft==0.17.1
Who can help?
@benjaminbossan @githubnemo
Reproduction
Reproduction script : bug_reprod.py
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("OpenGVLab/InternVL3_5-1B-HF", trust_remote_code=True)
lm_head = model.lm_head
model = model.language_model
model.lm_head = lm_head
from peft import get_peft_model
from peft import LoraConfig
peft_config = LoraConfig(
inference_mode=False,
r=12,
target_modules="all-linear",
)
bug_model = get_peft_model(model, peft_config)
bug_model.print_trainable_parameters()
breakpoint() # p bug_model, you will find lm_head have different results
put bug_reprod.py to x86 and aarch, run it you will find it have different results on lm_head! following figure show the error :
x86
aarch
Expected behavior
target_module='all-linear' should exclude lm_head in lora tuning. At least, x86, arm arch should have identical behavior.
Hmm, that's a very strange problem. Unfortunately, I don't have an ARM machine to test this one, what exactly are you using?
The logic to determine what layers to exclude is found here in PEFT:
https://github.com/huggingface/peft/blob/5d58b515672d31c57cc61a4c50a6635df645ec20/src/peft/tuners/tuners_utils.py#L1782-L1807
I really don't see how the architecture can influence this. What you could try to help debugging is to set a breakpoint in this part of PEFT and see how the code execution differs between the two architectures.
If you just want a solution to your issue, the most robust way would be to set target_modules=[...] where you list all layers you want to target explicitly.
Hmm, that's a very strange problem. Unfortunately, I don't have an ARM machine to test this one, what exactly are you using?
The logic to determine what layers to exclude is found here in PEFT:
peft/src/peft/tuners/tuners_utils.py
Lines 1782 to 1807 in 5d58b51
Try to remove linear layers that should not be targeted as best as possible. We have to rely on convention as
there are no hard rules to detect these modules.
module_names_to_exclude = set() if isinstance(model, PreTrainedModel): output_emb = model.get_output_embeddings() if output_emb is not None: # ignore the last classification head for text generation models last_module_name = [name for name, module in model.named_modules() if module is output_emb][0] module_names_to_exclude.add(last_module_name) elif peft_config.task_type == TaskType.SEQ_CLS: # ignore classifier head for classification models (issue 2027) # there is no fix name for the classifier head, so check the common ones for name in SEQ_CLS_HEAD_NAMES: cls_head = getattr(model, name, None) if cls_head is not None: last_module_name = [name for name, module in model.named_modules() if module is cls_head][0] module_names_to_exclude.add(last_module_name) break
we don't want nested LoRA layers, i.e. LoRA being applied to possibly existing lora_A, lora_B, etc.
see 2390
for prefix, module in model.named_modules(): if isinstance(module, BaseTunerLayer): for suffix, child in module.named_modules(): if suffix: module_names_to_exclude.add(f"{prefix}.{suffix}") I really don't see how the architecture can influence this. What you could try to help debugging is to set a breakpoint in this part of PEFT and see how the code execution differs between the two architectures.
If you just want a solution to your issue, the most robust way would be to set
target_modules=[...]where you list all layers you want to target explicitly.
yes, indeed it's strange...orz TKS for your quick reply, i may set breakpoint to inspect it ~
may provide some info here later
Hi @BenjaminBossan,
Jumping in here — and just to clarify for you @HuangChiEn, I’m not a maintainer, just another contributor like you. I went through the discussion between you both and figured I’d test this out on ARM as well. While I also don’t see why the architecture would cause any behavioural difference, I tried reproducing the issue on my device but couldn’t.
i wanted to be sure that nothing was amiss , so i also wrote another very rough validator to show everything else :
import platform
import sys
import torch
print("="*80)
print("EXHAUSTIVE DEBUGGING - ALL ANGLES")
print("="*80)
print(f"Architecture: {platform.machine()}")
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
import peft
import transformers
print(f"PEFT: {peft.__version__}")
print(f"Transformers: {transformers.__version__}")
from transformers import AutoModelForImageTextToText
from peft import LoraConfig, get_peft_model
print("\n" + "="*80)
print("LOADING MODEL - EXACT REPRODUCTION")
print("="*80)
model = AutoModelForImageTextToText.from_pretrained("OpenGVLab/InternVL3_5-1B-HF", trust_remote_code=True)
lm_head = model.lm_head
model = model.language_model
model.lm_head = lm_head
print(f"\n1. Model structure:")
print(f" model.__class__: {model.__class__}")
print(f" model type: {type(model)}")
print(f"\n2. lm_head analysis:")
print(f" lm_head type: {type(model.lm_head)}")
print(f" lm_head id: {id(model.lm_head)}")
print(f" lm_head device: {model.lm_head.weight.device}")
print(f"\n3. get_output_embeddings() analysis:")
output_emb = model.get_output_embeddings()
print(f" output_emb: {output_emb}")
print(f" output_emb type: {type(output_emb)}")
print(f" output_emb id: {id(output_emb)}")
if output_emb is not None and hasattr(output_emb, 'weight'):
print(f" output_emb device: {output_emb.weight.device}")
print(f"\n4. Identity check:")
print(f" output_emb is model.lm_head: {output_emb is model.lm_head}")
print(f" output_emb == model.lm_head: {output_emb == model.lm_head}")
print(f" id(output_emb) == id(model.lm_head): {id(output_emb) == id(model.lm_head)}")
print(f"\n5. Module registration:")
print(f" 'lm_head' in model._modules: {'lm_head' in model._modules}")
print(f" 'lm_head' in dir(model): {'lm_head' in dir(model)}")
print(f"\n6. named_modules() iteration:")
all_names = []
lm_head_matches = []
output_emb_matches = []
for name, module in model.named_modules():
all_names.append(name)
if 'lm_head' in name or name == 'lm_head':
lm_head_matches.append((name, type(module).__name__, id(module)))
if module is output_emb:
output_emb_matches.append((name, type(module).__name__, id(module)))
print(f" Total modules: {len(all_names)}")
print(f" Modules with 'lm_head': {len(lm_head_matches)}")
for name, typ, mid in lm_head_matches:
print(f" - '{name}': {typ} (id={mid})")
print(f" Modules matching output_emb by identity: {len(output_emb_matches)}")
for name, typ, mid in output_emb_matches:
print(f" - '{name}': {typ} (id={mid})")
print(f"\n7. Linear modules count:")
import torch.nn as nn
linear_count = sum(1 for name, module in model.named_modules() if isinstance(module, nn.Linear))
print(f" Total Linear modules: {linear_count}")
print("\n" + "="*80)
print("TESTING EXCLUSION LOGIC MANUALLY")
print("="*80)
# Manually run the exclusion logic
from transformers import PreTrainedModel
linear_module_names = set()
for name, module in model.named_modules():
if isinstance(module, nn.Linear):
linear_module_names.add(name)
module_names_to_exclude = set()
if isinstance(model, PreTrainedModel):
output_emb = model.get_output_embeddings()
if output_emb is not None:
# This is the EXACT code from PEFT
matches = [name for name, module in model.named_modules() if module is output_emb]
print(f"Identity matches for output_emb: {matches}")
if matches:
last_module_name = matches[0]
module_names_to_exclude.add(last_module_name)
print(f"Will exclude: '{last_module_name}'")
print(f"\nLinear modules before exclusion: {len(linear_module_names)}")
print(f"Modules to exclude: {module_names_to_exclude}")
linear_module_names -= module_names_to_exclude
print(f"Linear modules after exclusion: {len(linear_module_names)}")
print(f"'lm_head' in excluded: {'lm_head' in module_names_to_exclude}")
print("\n" + "="*80)
print("APPLYING PEFT")
print("="*80)
peft_config = LoraConfig(
inference_mode=False,
r=12,
target_modules="all-linear",
)
bug_model = get_peft_model(model, peft_config)
bug_model.print_trainable_parameters()
print("\n" + "="*80)
print("POST-PEFT ANALYSIS")
print("="*80)
print(f"bug_model.lm_head type: {type(bug_model.lm_head)}")
print(f"bug_model.lm_head: {bug_model.lm_head}")
# Check the string representation
lm_head_str = str(bug_model.lm_head)
print(f"\nString representation length: {len(lm_head_str)} chars")
print(f"First 500 chars:\n{lm_head_str[:500]}")
# Detailed check for LoRA
has_lora_in_str = 'lora' in lm_head_str.lower()
has_lora_in_type = 'lora' in str(type(bug_model.lm_head)).lower()
has_lora_attr = hasattr(bug_model.lm_head, 'lora_A')
print(f"\nLoRA detection:")
print(f" 'lora' in str(lm_head): {has_lora_in_str}")
print(f" 'lora' in type(lm_head): {has_lora_in_type}")
print(f" hasattr(lm_head, 'lora_A'): {has_lora_attr}")
print("\n" + "="*80)
print("FINAL VERDICT")
print("="*80)
if has_lora_in_str or has_lora_in_type or has_lora_attr:
print("❌ BUG REPRODUCED: lm_head has LoRA layers!")
print(f" Architecture: {platform.machine()}")
print(f" PEFT version: {peft.__version__}")
else:
print("✅ Bug NOT reproduced: lm_head correctly excluded")
print(f" Architecture: {platform.machine()}")
print(f" PEFT version: {peft.__version__}")
print("\n" + "="*80)
print("ALL MODULES IN PEFT MODEL (first 50):")
print("="*80)
for i, (name, module) in enumerate(bug_model.named_modules()):
if i < 50:
print(f" {name}: {type(module).__name__}")
and the output for that was :
================================================================================
EXHAUSTIVE DEBUGGING - ALL ANGLES
================================================================================
Architecture: arm64
Python: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 12:55:12) [Clang 14.0.6 ]
PyTorch: 2.9.0
PEFT: 0.17.1
Transformers: 4.57.1
================================================================================
LOADING MODEL - EXACT REPRODUCTION
================================================================================
1. Model structure:
model.__class__: <class 'transformers.models.qwen3.modeling_qwen3.Qwen3Model'>
model type: <class 'transformers.models.qwen3.modeling_qwen3.Qwen3Model'>
2. lm_head analysis:
lm_head type: <class 'torch.nn.modules.linear.Linear'>
lm_head id: 4729322864
lm_head device: cpu
3. get_output_embeddings() analysis:
output_emb: Linear(in_features=1024, out_features=151936, bias=False)
output_emb type: <class 'torch.nn.modules.linear.Linear'>
output_emb id: 4729322864
output_emb device: cpu
4. Identity check:
output_emb is model.lm_head: True
output_emb == model.lm_head: True
id(output_emb) == id(model.lm_head): True
5. Module registration:
'lm_head' in model._modules: True
'lm_head' in dir(model): True
6. named_modules() iteration:
Total modules: 426
Modules with 'lm_head': 1
- 'lm_head': Linear (id=4729322864)
Modules matching output_emb by identity: 1
- 'lm_head': Linear (id=4729322864)
7. Linear modules count:
Total Linear modules: 197
================================================================================
TESTING EXCLUSION LOGIC MANUALLY
================================================================================
Identity matches for output_emb: ['lm_head']
Will exclude: 'lm_head'
Linear modules before exclusion: 197
Modules to exclude: {'lm_head'}
Linear modules after exclusion: 196
'lm_head' in excluded: True
================================================================================
APPLYING PEFT
================================================================================
trainable params: 7,569,408 || all params: 759,201,792 || trainable%: 0.9970
================================================================================
POST-PEFT ANALYSIS
================================================================================
bug_model.lm_head type: <class 'torch.nn.modules.linear.Linear'>
bug_model.lm_head: Linear(in_features=1024, out_features=151936, bias=False)
String representation length: 57 chars
First 500 chars:
Linear(in_features=1024, out_features=151936, bias=False)
LoRA detection:
'lora' in str(lm_head): False
'lora' in type(lm_head): False
hasattr(lm_head, 'lora_A'): False
================================================================================
FINAL VERDICT
================================================================================
✅ Bug NOT reproduced: lm_head correctly excluded
Architecture: arm64
PEFT version: 0.17.1
================================================================================
ALL MODULES IN PEFT MODEL (first 50):
================================================================================
: PeftModel
base_model: LoraModel
base_model.model: Qwen3Model
base_model.model.embed_tokens: Embedding
base_model.model.layers: ModuleList
base_model.model.layers.0: Qwen3DecoderLayer
base_model.model.layers.0.self_attn: Qwen3Attention
base_model.model.layers.0.self_attn.q_proj: Linear
base_model.model.layers.0.self_attn.q_proj.base_layer: Linear
base_model.model.layers.0.self_attn.q_proj.lora_dropout: ModuleDict
base_model.model.layers.0.self_attn.q_proj.lora_dropout.default: Identity
base_model.model.layers.0.self_attn.q_proj.lora_A: ModuleDict
base_model.model.layers.0.self_attn.q_proj.lora_A.default: Linear
base_model.model.layers.0.self_attn.q_proj.lora_B: ModuleDict
base_model.model.layers.0.self_attn.q_proj.lora_B.default: Linear
base_model.model.layers.0.self_attn.q_proj.lora_embedding_A: ParameterDict
base_model.model.layers.0.self_attn.q_proj.lora_embedding_B: ParameterDict
base_model.model.layers.0.self_attn.q_proj.lora_magnitude_vector: ModuleDict
base_model.model.layers.0.self_attn.k_proj: Linear
base_model.model.layers.0.self_attn.k_proj.base_layer: Linear
base_model.model.layers.0.self_attn.k_proj.lora_dropout: ModuleDict
base_model.model.layers.0.self_attn.k_proj.lora_dropout.default: Identity
base_model.model.layers.0.self_attn.k_proj.lora_A: ModuleDict
base_model.model.layers.0.self_attn.k_proj.lora_A.default: Linear
base_model.model.layers.0.self_attn.k_proj.lora_B: ModuleDict
base_model.model.layers.0.self_attn.k_proj.lora_B.default: Linear
base_model.model.layers.0.self_attn.k_proj.lora_embedding_A: ParameterDict
base_model.model.layers.0.self_attn.k_proj.lora_embedding_B: ParameterDict
base_model.model.layers.0.self_attn.k_proj.lora_magnitude_vector: ModuleDict
base_model.model.layers.0.self_attn.v_proj: Linear
base_model.model.layers.0.self_attn.v_proj.base_layer: Linear
base_model.model.layers.0.self_attn.v_proj.lora_dropout: ModuleDict
base_model.model.layers.0.self_attn.v_proj.lora_dropout.default: Identity
base_model.model.layers.0.self_attn.v_proj.lora_A: ModuleDict
base_model.model.layers.0.self_attn.v_proj.lora_A.default: Linear
base_model.model.layers.0.self_attn.v_proj.lora_B: ModuleDict
base_model.model.layers.0.self_attn.v_proj.lora_B.default: Linear
base_model.model.layers.0.self_attn.v_proj.lora_embedding_A: ParameterDict
base_model.model.layers.0.self_attn.v_proj.lora_embedding_B: ParameterDict
base_model.model.layers.0.self_attn.v_proj.lora_magnitude_vector: ModuleDict
base_model.model.layers.0.self_attn.o_proj: Linear
base_model.model.layers.0.self_attn.o_proj.base_layer: Linear
base_model.model.layers.0.self_attn.o_proj.lora_dropout: ModuleDict
base_model.model.layers.0.self_attn.o_proj.lora_dropout.default: Identity
base_model.model.layers.0.self_attn.o_proj.lora_A: ModuleDict
base_model.model.layers.0.self_attn.o_proj.lora_A.default: Linear
base_model.model.layers.0.self_attn.o_proj.lora_B: ModuleDict
base_model.model.layers.0.self_attn.o_proj.lora_B.default: Linear
base_model.model.layers.0.self_attn.o_proj.lora_embedding_A: ParameterDict
base_model.model.layers.0.self_attn.o_proj.lora_embedding_B: ParameterDict
Hence , i was also neither able to reproduce the error nor find any trialing hint of its probable cause here . @HuangChiEn i think you could perhaps look to run this script as well and verify if there are any significant differences between my findings and yours in any of these parameters . perhaps that could lead to some valuable information here .
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.