ChatGLM-Efficient-Tuning icon indicating copy to clipboard operation
ChatGLM-Efficient-Tuning copied to clipboard

ChatGLM2 p-tuning 报错。 RuntimeError: The size of tensor a (247) must match the size of tensor b (231) at non-singleton dimension 3

Open lcl1990 opened this issue 1 year ago • 4 comments

脚本: CUDA_VISIBLE_DEVICES=0 python ../src/train_sft.py
--do_train
--model_name_or_path ~/models/pretrain/chatglm2-6b
--dataset alpaca_gpt4_zh
--dataset_dir ../data
--finetuning_type p_tuning
--output_dir ../output/
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 30.0
--plot_loss
--use_v2

错误:

0%| | 0/91530 [00:00<?, ?it/s]Traceback (most recent call last): File "../src/train_sft.py", line 105, in main() File "../src/train_sft.py", line 73, in main train_result = trainer.train() File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 2759, in training_step loss = self.compute_loss(model, inputs) File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/transformers/trainer.py", line 2784, in compute_loss outputs = model(**inputs) File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 928, in forward transformer_outputs = self.transformer( File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 824, in forward hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder( File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 628, in forward layer_ret = torch.utils.checkpoint.checkpoint( File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(*args) File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 541, in forward attention_output, kv_cache = self.self_attention( File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 438, in forward context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask) File "/home/xxx/anaconda3/envs/chatglm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 228, in forward context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer, RuntimeError: The size of tensor a (247) must match the size of tensor b (231) at non-singleton dimension 3

lcl1990 avatar Jul 03 '23 05:07 lcl1990

我也是,感觉是没有适配ptuning,相差的维度正好是我设置的pre_seq_len的大小

Julylmm avatar Jul 04 '23 06:07 Julylmm

请更新代码和 ChatGLM2 模型目录中的 .py 文件,并且添加 --fp16 参数后重试

hiyouga avatar Jul 04 '23 15:07 hiyouga

同样的错误,好久了,请问怎么解决的?

Shkklt avatar Jul 07 '23 07:07 Shkklt

File "/home/xxxx/.cache/huggingface/modules/transformers_modules/chatglm2/modeling_chatglm.py", line 438, in forward context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask) File "/home/xxxx/anaconda3/envs/linglong0.1/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/xxxx/.cache/huggingface/modules/transformers_modules/chatglm2/modeling_chatglm.py", line 228, in forward context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer, RuntimeError: The size of tensor a (402) must match the size of tensor b (338) at non-singleton dimension 3

Shkklt avatar Jul 07 '23 07:07 Shkklt