LLama RuntimeError: CUDA error: device-side assert triggered
I am interested in working with the Arabic language. I have tried adding all the tokens to the tokenizer, and the tokenizer seems to work fine. However, during training, I encountered an error. I am looking for a solution to resolve this error.
0%| | 0/1524 [00:00<?, ?it/s]Traceback (most recent call last): File "alpaca-lora/finetune.py", line 234, in <module> fire.Fire(train) File ".local/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/c703/c7031420/.local/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/c703/c7031420/.local/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "alpaca-lora/finetune.py", line 203, in train trainer.train() File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train return inner_training_loop( File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step loss = self.compute_loss(model, inputs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss outputs = model(**inputs) File ".conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/peft/peft_model.py", line 575, in forward return self.base_model( File ".conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 765, in forward outputs = self.model( File ".conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File ".conda/envs/llama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 574, in forward attention_mask = self._prepare_decoder_attention_mask( File ".conda/envs/llama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 476, in _prepare_decoder_attention_mask combined_attention_mask = _make_causal_mask( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
transformersversion: 4.28.0.dev0- Platform: Linux-4.18.0-372.16.1.el8_6.0.1.x86_64-x86_64-with-glibc2.28
- Python version: 3.9.7
- Huggingface_hub version: 0.13.3
- PyTorch version (GPU?): 2.0.0+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Your log was saying your tokenizer doen't seem to work fine. Because your input sentence first went to attention mask and failed. It means the tokenizer can't even convert it to tensor.
I have encountered the same error as you. Have you resolved this error now?
@YSLLYW yes if you add new tokens to your tokenizer you should resize your model
model.resize_token_embeddings(len(tokenizer))
Thank you very much for your answer. Did you also encounter this error in multiple GPU environments? What specific part of the code should I modify?
@YSLLYW did you change the original tokenizer or not?
Yes, I changed the tokenizer_config and changed the value of 'tokenizer_class' to "LLaMATokenizer"
是的,如果您向分词器添加新令牌,则应调整模型大小
model.resize_token_embeddings(len(tokenizer))
Hello, an error was reported in the case of multiple GPUs. How can I resolve this issue?According to your suggestion, it won't work? RuntimeError: CUDA error: device side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_ LAUNCH_ BLOCKING=1.
Compile with TORCH_ USE_ CUDA_ DSA to enable device-side assertions.
@YSLLYW I met the same problem, have you resolved it?
@YSLLYW I met the same problem, have you resolved it?
多卡训练的吗?
@YSLLYW Yes. I figured it out. Should add torchrun.
@YSLLYW Yes. I figured it out. Should add torchrun.
是的,使用torchrun --nproc_per_node=2 --master_port=1234 finetune.py
@YSLLYW Yes. I figured it out. Should add torchrun.
是的,使用torchrun --nproc_per_node=2 --master_port=1234 finetune.py
Using torchrun is not model parallelism. I can run python finetune.py directly in other environments. The device_map = "auto" used in the code