脚本:
CUDA_VISIBLE_DEVICES=0 python ../src/train_sft.py
--do_train
--model_name_or_path ~/models/pretrain/chatglm2-6b
--dataset alpaca_gpt4_zh
--dataset_dir ../data
--finetuning_type p_tuning
--output_dir ../output/
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 30.0
--plot_loss
--use_v2
错误:
0%| | 0/91530 [00:00<?, ?it/s]Traceback (most recent call last):
File "../src/train_sft.py", line 105, in
main()
File "../src/train_sft.py", line 73, in main
train_result = trainer.train()
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 2759, in training_step
loss = self.compute_loss(model, inputs)
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 2784, in compute_loss
outputs = model(**inputs)
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 928, in forward
transformer_outputs = self.transformer(
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 824, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 628, in forward
layer_ret = torch.utils.checkpoint.checkpoint(
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 541, in forward
attention_output, kv_cache = self.self_attention(
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 438, in forward
context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask)
File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 228, in forward
context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,
RuntimeError: The size of tensor a (247) must match the size of tensor b (231) at non-singleton dimension 3
我也是,感觉是没有适配ptuning,相差的维度正好是我设置的pre_seq_len的大小
请更新代码和 ChatGLM2 模型目录中的 .py 文件,并且添加 --fp16 参数后重试
File "/home/xxxx/.cache/huggingface/modules/transformers_modules/chatglm2/modeling_chatglm.py", line 438, in forward
context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask)
File "/home/xxxx/anaconda3/envs/linglong0.1/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxxx/.cache/huggingface/modules/transformers_modules/chatglm2/modeling_chatglm.py", line 228, in forward
context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,
RuntimeError: The size of tensor a (402) must match the size of tensor b (338) at non-singleton dimension 3