Streamer-Sales icon indicating copy to clipboard operation
Streamer-Sales copied to clipboard

重现问题

Open AriesJin opened this issue 9 months ago • 1 comments

当我向重现训练过程的时候发现一些问题,每次到eva部分都会报下面的错

03/10 11:11:05 - mmengine - INFO - before_train in EvaluateChatHook.
forword InternLM2RotaryEmbedding()
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/train.py", line 360, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/xtuner/tools/train.py", line 356, in main
    runner.train()
  File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/_flexible_runner.py", line 1200, in train
    model = self.train_loop.run()  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 273, in run
    self.runner.call_hook('before_train')
  File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/_flexible_runner.py", line 1271, in call_hook
    getattr(hook, fn_name)(self, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 230, in before_train
    self._generate_samples(runner, max_new_tokens=50)
  File "/usr/local/lib/python3.10/dist-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 219, in _generate_samples
    self._eval_language(runner, model, device, max_new_tokens,
  File "/usr/local/lib/python3.10/dist-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 177, in _eval_language
    generation_output = model.generate(
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1148, in generate
    outputs = self.base_model.generate(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1575, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2697, in _sample
    outputs = self(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm/internlm2-chat-1_8b/889898a85c5b880178a87e8c525acd5acb7a0096/modeling_internlm2.py", line 1218, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm/internlm2-chat-1_8b/889898a85c5b880178a87e8c525acd5acb7a0096/modeling_internlm2.py", line 1014, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm/internlm2-chat-1_8b/889898a85c5b880178a87e8c525acd5acb7a0096/modeling_internlm2.py", line 748, in forward
    hidden_states, self_attn_weights, present_key_value = self.attention(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm/internlm2-chat-1_8b/889898a85c5b880178a87e8c525acd5acb7a0096/modeling_internlm2.py", line 332, in forward
    cos, sin = self.rotary_emb(value_states, position_ids)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xtuner/model/modules/dispatch/internlm2.py", line 44, in forward
    if (seq_len > self.max_seq_len_cached
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

但是我在InternLM2Attention中初始化的最后一行打印self.rotary_emb,其其结果全是InternLM2DynamicNTKScalingRotaryEmbedding(),也就是其用的是dynamic版本,那eva时候的用的InternLM2RotaryEmbedding是什么时候初始化的呢??

AriesJin avatar Mar 10 '25 03:03 AriesJin

同问,请问找到解决方法了吗

equinox-sun avatar Apr 03 '25 03:04 equinox-sun