ms-swift
ms-swift copied to clipboard
书生26B合并dpo后的适配器失败
合并脚本:
CUDA_VISIBLE_DEVICES=0 swift export \
--model_type internvl2-26b \
--model_id_or_path /root/f_data_1/InternVL2-26B \
--ckpt_dir "/root/visual_model/fine_tune/output/internvl2-26b/v1-20240810-171452/checkpoint-185" \
--merge_lora true
dpo使用lora训练后的检查点:
sft是能正常合并的。dpo在上周也能正常合并,这周新配置的环境,出现这个bug,我用peft自己合并也是这个报错。
报错:
Traceback (most recent call last):
File "/root/visual_model/swift/swift/cli/export.py", line 5, in <module>
export_main()
File "/root/visual_model/swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/root/visual_model/swift/swift/llm/export.py", line 190, in llm_export
merge_lora(args, device_map=args.merge_device_map)
File "/root/visual_model/swift/swift/llm/infer.py", line 113, in merge_lora
model, template = prepare_model_template(args, device_map=device_map, verbose=False)
File "/root/visual_model/swift/swift/llm/infer.py", line 230, in prepare_model_template
model = Swift.from_pretrained(model, args.ckpt_dir, inference_mode=True)
File "/root/visual_model/swift/swift/tuners/base.py", line 878, in from_pretrained
peft_model = load_peft_model(model, 'default')
File "/root/visual_model/swift/swift/tuners/base.py", line 864, in load_peft_model
return PeftModel.from_pretrained(
File "/root/visual_model/swift/swift/tuners/peft.py", line 367, in from_pretrained
return module_class.from_pretrained(model, model_id, *args, **kwargs)
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/peft_model.py", line 541, in from_pretrained
model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/peft_model.py", line 1542, in __init__
super().__init__(model, peft_config, adapter_name, **kwargs)
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/peft_model.py", line 155, in __init__
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
File "/root/visual_model/swift/swift/tuners/peft.py", line 315, in init
self.__init_origin__(model, config, adapter_name)
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 139, in __init__
super().__init__(model, config, adapter_name)
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 175, in __init__
self.inject_adapter(self.model, adapter_name)
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 431, in inject_adapter
self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)
File "/root/visual_model/swift/swift/tuners/peft.py", line 97, in _create_and_replace_hook
return self._create_and_replace_origin(*args, **kwargs)
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 224, in _create_and_replace
new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 346, in _create_new_module
raise ValueError(
ValueError: Target module InternLM2ForCausalLM(
(model): InternLM2Model(
(tok_embeddings): Embedding(92553, 6144, padding_idx=2)
(layers): ModuleList(
(0-47): 48 x InternLM2DecoderLayer(
(attention): InternLM2Attention(
(wqkv): Linear(in_features=6144, out_features=8192, bias=False)
(wo): Linear(in_features=6144, out_features=6144, bias=False)
(rotary_emb): InternLM2DynamicNTKScalingRotaryEmbedding()
)
(feed_forward): InternLM2MLP(
(w1): Linear(in_features=6144, out_features=16384, bias=False)
(w3): Linear(in_features=6144, out_features=16384, bias=False)
(w2): Linear(in_features=16384, out_features=6144, bias=False)
(act_fn): SiLU()
)
(attention_norm): InternLM2RMSNorm()
(ffn_norm): InternLM2RMSNorm()
)
)
(norm): InternLM2RMSNorm()
)
(output): Linear(in_features=6144, out_features=92553, bias=False)
) is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.
可能是bug,我复现一下
可能是bug,我复现一下
在群里问了下,拉取了周一的最新代码,可以正常合并了,但是合并写入了权重后迟迟不结束,就在那卡着。最后把我两卡A100卡死机了。可能还有bug
好的,我正在复现,稍等
书生 20B 感觉效果不如 7B
书生 20B 感觉效果不如 7B
哈哈,下周我8,26,40都体验下
该问题应该已经修复,如果还有问题请重新打开issue