自定义数据集微调internlm2_5_7b_chat注意力shape报错

Open Stardust-y opened this issue 10 months ago • 1 comments

[rank0]: Traceback (most recent call last): [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 360, in [rank0]: main() [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 356, in main [rank0]: runner.train() [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1200, in train [rank0]: model = self.train_loop.run() # type: ignore [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/loops.py", line 273, in run [rank0]: self.runner.call_hook('before_train') [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1271, in call_hook [rank0]: getattr(hook, fn_name)(self, **kwargs) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 234, in before_train [rank0]: self._generate_samples(runner, max_new_tokens=50) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 223, in _generate_samples [rank0]: self._eval_language(runner, model, device, max_new_tokens, [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 181, in _eval_language [rank0]: generation_output = model.generate( [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/peft_model.py", line 1491, in generate [rank0]: outputs = self.base_model.generate(*args, **kwargs) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2223, in generate [rank0]: result = self._sample( [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 3214, in _sample [rank0]: outputs = model_forward(**model_inputs, return_dict=True) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 1215, in forward [rank0]: outputs = self.model( [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 1010, in forward [rank0]: layer_outputs = decoder_layer( [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 744, in forward [rank0]: hidden_states, self_attn_weights, present_key_value = self.attention( [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 343, in forward [rank0]: attn_weights = attn_weights + causal_mask [rank0]: RuntimeError: The size of tensor a (41) must match the size of tensor b (40) at non-singleton dimension 3 [rank0]:[W227 03:06:08.848877297 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

打印了关键张量发现，再第一轮sample batch_size=32后，再继续训练seq_length 变为1，导致上述attention计算异常，是否是版本不匹配？torch=2.5.1, transformers=4.49.0

Feb 26 '25 19:02 Stardust-y

02/27 03:06:07 - mmengine - INFO - before_train in EvaluateChatHook. hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40]) shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40]) hidden s, am torch.Size([1, 1, 4096]) torch.Size([1, 1, 1, 40]) shape torch.Size([1, 1, 1, 40]) torch.Size([1, 32, 1, 41]) torch.Size([1, 1, 1, 40])

Feb 26 '25 19:02 Stardust-y