MOSS 推理时报错RuntimeError: CUDA error: device-side assert triggered

推理时报错RuntimeError: CUDA error: device-side assert triggered

Open liuxunchenglxc opened this issue 1 year ago • 7 comments

使用的模型是003的plugin int4模型，显卡是3090，报错时显存显示占用了12.5G。具体报错信息如下： /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [272,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Exception in thread Thread-7 (generate): Traceback (most recent call last): File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/opt/conda/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate return self.sample( File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample outputs = self( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/moss-003-sft-plugin-int4-g/modeling_moss.py", line 674, in forward transformer_outputs = self.transformer( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/moss-003-sft-plugin-int4-g/modeling_moss.py", line 545, in forward outputs = block( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/moss-003-sft-plugin-int4-g/modeling_moss.py", line 270, in forward attn_outputs = self.attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/moss-003-sft-plugin-int4-g/modeling_moss.py", line 192, in forward k_rot = apply_rotary_pos_emb(k_rot, sin, cos) File "/root/.cache/huggingface/modules/transformers_modules/moss-003-sft-plugin-int4-g/modeling_moss.py", line 56, in apply_rotary_pos_emb sin = torch.repeat_interleave(sin[:, :, None, :], 2, 3) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Apr 30 '23 03:04 liuxunchenglxc

同样碰到这个问题，貌似只要提交的文本很长时候，就会出现这个bug，而且只要报错后，再提问“你是谁”也会报错

May 05 '23 02:05 ChenBinfighting1

类似的问题： python3 moss_cli_demo.py --model_name fnlp/moss-moon-003-sft --gpu 0,1 Waiting for all devices to be ready, it may take a few minutes... 欢迎使用 MOSS 人工智能助手！输入内容即可进行对话。输入 clear 以清空对话历史，输入 stop 以终止对话。 <|Human|>: hello ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [33,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [34,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [35,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [36,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [37,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [38,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [39,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [40,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [41,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [42,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [43,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [44,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [45,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [46,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [47,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [48,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [49,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [50,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [51,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [52,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [53,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [54,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [55,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [56,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [57,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [58,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [59,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [60,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [61,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [62,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [63,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [1,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [2,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [3,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [9,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [10,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [11,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [12,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [13,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [14,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [15,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [16,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [17,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [18,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [19,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [20,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [21,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [22,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [23,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [24,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [25,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [26,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [27,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [28,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [29,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [30,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [31,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/MOSS-main/moss_cli_demo.py:97 in │ │ │ │ 94 │ │ │ print(response.lstrip('\n')) │ │ 95 │ │ 96 if name == "main": │ │ ❱ 97 │ main() │ │ 98 │ │ │ │ /usr/local/MOSS-main/moss_cli_demo.py:80 in main │ │ │ │ 77 │ │ prompt += '<|Human|>: ' + query + '' │ │ 78 │ │ inputs = tokenizer(prompt, return_tensors="pt") │ │ 79 │ │ with torch.no_grad(): │ │ ❱ 80 │ │ │ outputs = model.generate( │ │ 81 │ │ │ │ inputs.input_ids.cuda(), │ │ 82 │ │ │ │ attention_mask=inputs.attention_mask.cuda(), │ │ 83 │ │ │ │ max_length=2048, │ │ │ │ /usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 in decorate_context │ │ │ │ 112 │ @functools.wraps(func) │ │ 113 │ def decorate_context(*args, **kwargs): │ │ 114 │ │ with ctx_factory(): │ │ ❱ 115 │ │ │ return func(*args, **kwargs) │ │ 116 │ │ │ 117 │ return decorate_context │ │ 118 │ │ │ │ /usr/local/lib/python3.9/site-packages/transformers/generation/utils.py:1485 in generate │ │ │ │ 1482 │ │ │ ) │ │ 1483 │ │ │ │ │ 1484 │ │ │ # 13. run sample │ │ ❱ 1485 │ │ │ return self.sample( │ │ 1486 │ │ │ │ input_ids, │ │ 1487 │ │ │ │ logits_processor=logits_processor, │ │ 1488 │ │ │ │ logits_warper=logits_warper, │ │ │ │ /usr/local/lib/python3.9/site-packages/transformers/generation/utils.py:2524 in sample │ │ │ │ 2521 │ │ │ model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) │ │ 2522 │ │ │ │ │ 2523 │ │ │ # forward pass to get next token │ │ ❱ 2524 │ │ │ outputs = self( │ │ 2525 │ │ │ │ **model_inputs, │ │ 2526 │ │ │ │ return_dict=True, │ │ 2527 │ │ │ │ output_attentions=output_attentions, │ │ │ │ /usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, **kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /usr/local/MOSS-main/models/modeling_moss.py:678 in forward │ │ │ │ 675 │ │ """ │ │ 676 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │ │ 677 │ │ │ │ ❱ 678 │ │ transformer_outputs = self.transformer( │ │ 679 │ │ │ input_ids, │ │ 680 │ │ │ past_key_values=past_key_values, │ │ 681 │ │ │ attention_mask=attention_mask, │ │ │ │ /usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/MOSS-main/models/modeling_moss.py:545 in forward │ │ │ │ 542 │ │ │ │ │ head_mask[i], │ │ 543 │ │ │ │ ) │ │ 544 │ │ │ else: │ │ ❱ 545 │ │ │ │ outputs = block( │ │ 546 │ │ │ │ │ hidden_states=hidden_states, │ │ 547 │ │ │ │ │ layer_past=layer_past, │ │ 548 │ │ │ │ │ attention_mask=attention_mask, │ │ │ │ /usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, **kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /usr/local/MOSS-main/models/modeling_moss.py:270 in forward │ │ │ │ 267 │ ) -> Union[Tuple[torch.Tensor], Optional[Tuple[torch.Tensor, Tuple[torch.FloatTensor │ │ 268 │ │ residual = hidden_states │ │ 269 │ │ hidden_states = self.ln_1(hidden_states) │ │ ❱ 270 │ │ attn_outputs = self.attn( │ │ 271 │ │ │ hidden_states=hidden_states, │ │ 272 │ │ │ layer_past=layer_past, │ │ 273 │ │ │ attention_mask=attention_mask, │ │ │ │ /usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, **kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /usr/local/MOSS-main/models/modeling_moss.py:192 in forward │ │ │ │ 189 │ │ │ q_rot = query[:, :, :, : self.rotary_dim] │ │ 190 │ │ │ q_pass = query[:, :, :, self.rotary_dim :] │ │ 191 │ │ │ │ │ ❱ 192 │ │ │ k_rot = apply_rotary_pos_emb(k_rot, sin, cos) │ │ 193 │ │ │ q_rot = apply_rotary_pos_emb(q_rot, sin, cos) │ │ 194 │ │ │ │ │ 195 │ │ │ key = torch.cat([k_rot, k_pass], dim=-1) │ │ │ │ /usr/local/MOSS-main/models/modeling_moss.py:56 in apply_rotary_pos_emb │ │ │ │ 53 │ │ 54 # Copied from transformers.models.gptj.modeling_gptj.apply_rotary_pos_emb │ │ 55 def apply_rotary_pos_emb(tensor: torch.Tensor, sin: torch.Tensor, cos: torch.Tensor) -> │ │ ❱ 56 │ sin = torch.repeat_interleave(sin[:, :, None, :], 2, 3) │ │ 57 │ cos = torch.repeat_interleave(cos[:, :, None, :], 2, 3) │ │ 58 │ return (tensor * cos) + (rotate_every_two(tensor) * sin) │ │ 59 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.