环境:
transformers 4.34.0
torch 2.0.1+cu118
deepspeed 0.12.4
flash-attn 2.3.2
脚本:finetune_lora_ds.sh zero3
代码版本:使用最新代码
报错日志:
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.5232646465301514 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.6317603588104248 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.5552959442138672 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.5726652145385742 seconds
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.546497106552124 seconds
Time to load cpu_adam op: 0.5796217918395996 seconds
Parameter Offload: Total persistent parameters: 3284992 in 243 params
Traceback (most recent call last):
File "/home/xiaoi/pan/ssh/Qwen/finetune.py", line 360, in
train()
File "/home/xiaoi/pan/ssh/Qwen/finetune.py", line 353, in train
trainer.train()
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 2776, in training_step
loss = self.compute_loss(model, inputs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 2801, in compute_loss
outputs = model(**inputs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1822, in forward
loss = self.module(*inputs, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
Traceback (most recent call last):
File "/home/xiaoi/pan/ssh/Qwen/finetune.py", line 360, in
result = forward_call(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/peft/peft_model.py", line 918, in forward
train()
File "/home/xiaoi/pan/ssh/Qwen/finetune.py", line 353, in train
trainer.train()
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
return self.base_model(
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
return self.model.forward(*args, **kwargs)
File "/home/xiaoi/.cache/huggingface/modules/transformers_modules/Qwen-72B/modeling_qwen.py", line 1045, in forward
return inner_training_loop(
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop
transformer_outputs = self.transformer(
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
tr_loss_step = self.training_step(model, inputs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 2776, in training_step
result = forward_call(*args, **kwargs)
File "/home/xiaoi/.cache/huggingface/modules/transformers_modules/Qwen-72B/modeling_qwen.py", line 824, in forward
inputs_embeds = self.wte(input_ids)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/peft/utils/other.py", line 186, in forward
loss = self.compute_loss(model, inputs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/transformers/trainer.py", line 2801, in compute_loss
return self.modules_to_save[self.active_adapter](*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
result = hook(self, args)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 392, in _pre_forward_module_hook
self.pre_sub_module_forward_function(module)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 505, in pre_sub_module_forward_function
param_coordinator.fetch_sub_module(sub_module, forward=prev_grad_state)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
outputs = model(**inputs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return func(*args, **kwargs)
File "/home/xiaoi/anaconda3/envs/torch_p/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 310, in fetch_sub_module
assert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary()
AssertionError: {'id': 643, 'status': 'NOT_AVAILABLE', 'numel': 0, 'ds_numel': 0, 'shape': (0,), 'ds_shape': (0,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {7}, 'ds_tensor.shape': torch.Size([0])}
return forward_call(*args, **kwargs)
您好,请问您是微调的72B Base模型吗(可以提供下微调的模型名称)?
文档中提到,如果微调了Base模型(或者名字不带有"chat"的模型),则会将embedding加入finetune中,目前ZeRO 3对这种方式的支持仍然存在issue中提到的问题,建议修改finetune.py代码,显式将embedding排除出微调参数:
if lora_args.q_lora or 'chat' in model_args.model_name_or_path.lower():
modules_to_save = None
else:
# modules_to_save = ["wte", "lm_head"]
modules_to_save = None # 修改为这一行
您好,请问您是微调的72B Base模型吗(可以提供下微调的模型名称)? 文档中提到,如果微调了Base模型(或者名字不带有"chat"的模型),则会将embedding加入finetune中,目前ZeRO 3对这种方式的支持仍然存在issue中提到的问题,建议修改finetune.py代码,显式将embedding排除出微调参数:
if lora_args.q_lora or 'chat' in model_args.model_name_or_path.lower():
modules_to_save = None
else:
# modules_to_save = ["wte", "lm_head"]
modules_to_save = None # 修改为这一行
想请问一下如果调base不存wte和lm_head,输出的预期将是什么样的?就是学不会|<im_strat>| |<im_end>|两个token吗?也就是想问一下对训练得到模型的性能上有什么样的损失呢?
您好,请问您是微调的72B Base模型吗(可以提供下微调的模型名称)? 文档中提到,如果微调了Base模型(或者名字不带有"chat"的模型),则会将embedding加入finetune中,目前ZeRO 3对这种方式的支持仍然存在issue中提到的问题,建议修改finetune.py代码,显式将embedding排除出微调参数:
if lora_args.q_lora or 'chat' in model_args.model_name_or_path.lower():
modules_to_save = None
else:
# modules_to_save = ["wte", "lm_head"]
modules_to_save = None # 修改为这一行
想请问一下如果调base不存wte和lm_head,输出的预期将是什么样的?就是学不会|<im_strat>| |<im_end>|两个token吗?也就是想问一下对训练得到模型的性能上有什么样的损失呢?
同问
@Luobots @chenyzh28 微调base且不微调embedding的情况下,无法学到两个特殊token,可能对性能有一定影响,具体有多大影响我们这边暂时没有详细的数据;我们正在开发代码以解决base模型无法微调embedding问题。
您好,请问您是微调的72B Base模型吗(可以提供下微调的模型名称)? 文档中提到,如果微调了Base模型(或者名字不带有"chat"的模型),则会将embedding加入finetune中,目前ZeRO 3对这种方式的支持仍然存在issue中提到的问题,建议修改finetune.py代码,显式将embedding排除出微调参数:
if lora_args.q_lora or 'chat' in model_args.model_name_or_path.lower():
modules_to_save = None
else:
# modules_to_save = ["wte", "lm_head"]
modules_to_save = None # 修改为这一行
想请问一下如果调base不存wte和lm_head,输出的预期将是什么样的?就是学不会|<im_strat>| |<im_end>|两个token吗?也就是想问一下对训练得到模型的性能上有什么样的损失呢?
尝试了一下,微调之后可能出现生成无法停止的情况,正常的结果生成完了之后会继续生成随机(大概)token,然后还可能会报“unknown ids”的错。