alpaca-lora LLama RuntimeError: CUDA error: device-side assert triggered

I am interested in working with the Arabic language. I have tried adding all the tokens to the tokenizer, and the tokenizer seems to work fine. However, during training, I encountered an error. I am looking for a solution to resolve this error.

0%| | 0/1524 [00:00<?, ?it/s]Traceback (most recent call last): File "alpaca-lora/finetune.py", line 234, in <module> fire.Fire(train) File ".local/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/c703/c7031420/.local/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/c703/c7031420/.local/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "alpaca-lora/finetune.py", line 203, in train trainer.train() File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train return inner_training_loop( File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step loss = self.compute_loss(model, inputs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss outputs = model(**inputs) File ".conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/peft/peft_model.py", line 575, in forward return self.base_model( File ".conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward outputs = self.model( File ".conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 574, in forward attention_mask = self._prepare_decoder_attention_mask( File ".conda/envs/llama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 476, in _prepare_decoder_attention_mask combined_attention_mask = _make_causal_mask( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

transformers version: 4.28.0.dev0
Platform: Linux-4.18.0-372.16.1.el8_6.0.1.x86_64-x86_64-with-glibc2.28
Python version: 3.9.7
Huggingface_hub version: 0.13.3
PyTorch version (GPU?): 2.0.0+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Apr 14 '23 17:04 abdoelsayed2016

Your log was saying your tokenizer doen't seem to work fine. Because your input sentence first went to attention mask and failed. It means the tokenizer can't even convert it to tensor.

Apr 17 '23 22:04 lywinged

I have encountered the same error as you. Have you resolved this error now?

Apr 27 '23 12:04 YSLLYW

@YSLLYW yes if you add new tokens to your tokenizer you should resize your model model.resize_token_embeddings(len(tokenizer))

Apr 27 '23 12:04 abdoelsayed2016

Thank you very much for your answer. Did you also encounter this error in multiple GPU environments? What specific part of the code should I modify?

Apr 27 '23 12:04 YSLLYW

@YSLLYW did you change the original tokenizer or not?

Apr 27 '23 12:04 abdoelsayed2016

Yes, I changed the tokenizer_config and changed the value of 'tokenizer_class' to "LLaMATokenizer"

Apr 27 '23 12:04 YSLLYW

是的，如果您向分词器添加新令牌，则应调整模型大小 model.resize_token_embeddings(len(tokenizer))

Hello, an error was reported in the case of multiple GPUs. How can I resolve this issue?According to your suggestion, it won't work？ RuntimeError: CUDA error: device side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_ LAUNCH_ BLOCKING=1. Compile with TORCH_ USE_ CUDA_ DSA to enable device-side assertions.

Apr 30 '23 13:04 YSLLYW

@YSLLYW I met the same problem, have you resolved it?

May 02 '23 03:05 BinWang28

@YSLLYW I met the same problem, have you resolved it?

多卡训练的吗？

May 02 '23 06:05 YSLLYW

@YSLLYW Yes. I figured it out. Should add torchrun.

May 02 '23 07:05 BinWang28

@YSLLYW Yes. I figured it out. Should add torchrun.

是的，使用torchrun --nproc_per_node=2 --master_port=1234 finetune.py

May 02 '23 07:05 YSLLYW

@YSLLYW Yes. I figured it out. Should add torchrun.

是的，使用torchrun --nproc_per_node=2 --master_port=1234 finetune.py

Using torchrun is not model parallelism. I can run python finetune.py directly in other environments. The device_map = "auto" used in the code

May 27 '23 06:05 kevinuserdd