RuntimeError: CUDA driver error: invalid argument
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
llamafactoryversion: 0.9.3.dev0- Platform: Linux-5.4.250-2-velinux1u1-amd64-x86_64-with-glibc2.31
- Python version: 3.10.16
- PyTorch version: 2.6.0+cu124 (GPU)
- Transformers version: 4.51.3
- Datasets version: 3.5.0
- Accelerate version: 1.6.0
- PEFT version: 0.15.1
- TRL version: 0.9.6
- GPU type: NVIDIA A800-SXM4-80GB
- GPU number: 4
- GPU memory: 79.35GB
- DeepSpeed version: 0.16.6
- vLLM version: 0.8.4
- Git commit: b6a10d1732328533961d0ce71819803e9d9883cc
Reproduction
I am using VQA (Visual Question Answering) type data for SFT (Supervised Fine-Tuning). The format of the dataset is as follows:
`{
"images": [list of image paths],
"messages" : {
"role": "system/assistant/user",
"content" : "content_text"
}
}`
When I perform SFT, some data can be correctly fine-tuned, but some data will produce the following error:
`[rank0]: Traceback (most recent call last):
[rank0]: File "/my_local_path/vts/train/sft/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
[rank0]: launch()
[rank0]: File "/my_local_path/vts/train/sft/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/my_local_path/vts/train/sft/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/my_local_path/vts/train/sft/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/my_local_path/vts/train/sft/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft
[rank0]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 2245, in train
[rank0]: return inner_training_loop(
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 2560, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 3736, in training_step
[rank0]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]: File "/my_local_path/vts/train/sft/LLaMA-Factory/src/llamafactory/train/sft/trainer.py", line 101, in compute_loss
[rank0]: return super().compute_loss(model, inputs, *args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 3801, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2054, in forward
[rank0]: loss = self.module(*inputs, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl
[rank0]: return inner()
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1793, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/peft_model.py", line 1756, in forward
[rank0]: return self.base_model(
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl
[rank0]: return inner()
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1793, in inner
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 193, in forward
[rank0]: return self.model.forward(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1854, in forward
[rank0]: loss = loss_fct(shift_logits, shift_labels)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/loss.py", line 1295, in forward
[rank0]: return F.cross_entropy(
[rank0]: File "/my_local_path/miniconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/functional.py", line 3494, in cross_entropy
[rank0]: return torch._C._nn.cross_entropy_loss(
[rank0]: RuntimeError: CUDA driver error: invalid argument`
Why is this happening?
Others
No response
Same problem with the latest version。。。。。
Have you fixed this bug?
Could you find which data will lead to such error?
In fact, the problem occurs when I run the first data in my dataset. My dataset is a JSON file containing 223,000 data entries. Below are the commands I executed:
llamafactory-cli train \
--stage sft \
--do_train True \
--model_name_or_path /my_local_path/model/Qwen2.5-VL-7B-Instruct \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--template qwen2_vl \
--flash_attn auto \
--dataset_dir data \
--dataset vts_qwen72b_gen_223k \
--cutoff_len 8192 \
--learning_rate 5e-05 \
--num_train_epochs 3.0 \
--max_samples 1000000 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 50 \
--warmup_steps 0 \
--packing False \
--report_to none \
--output_dir saves/Qwen2.5-VL-7B-Instruct/lora/qwen_7b_sft_223k \
--bf16 True \
--plot_loss True \
--trust_remote_code True \
--ddp_timeout 180000000 \
--include_num_input_tokens_seen True \
--optim adamw_torch \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0 \
--lora_target all \
--val_size 0.1 \
--eval_strategy steps \
--eval_steps 50 \
--per_device_eval_batch_size 4 \
--deepspeed cache/ds_z3_config.json```
But I previously ran a dataset with 2,000 entries in the same format, and it trained successfully without any issues.
i get the same problem in 0.9.3.dev0, how to fix it ?
[rank1]: Traceback (most recent call last):
[rank1]: File "/cloud/data2/ethan/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
Have you guys added special tokens to your tokenizer but do not resize lm_embedding leads to a mismatch between labels class and lm_head. It seems that they are all raised because of
nn.functional.cross_entropy.
Hello, I met the same issue, and I didn't change anything you've mentioned about the tokenizer. For me, although I got the same error logs, I found there were differences between my worker 0 and worker 1 nodes.
In worker 0, RuntimeError: CUDA driver error: invalid argument and shift_logits = logits[..., :-1, :].contiguous();
while in worker 1, the error occurred when return torch._C._nn.cross_entropy_loss(, and CUDA out of memory.
So I changed zero from stage-1 to stage-3 (offload) and it worked.
It seemed that the OOM error didn't caught by the worker 0 and this point was deceptive and resulted in confusion
Have you guys added special tokens to your tokenizer but do not resize lm_embedding leads to a mismatch between labels class and lm_head. It seems that they are all raised because of
nn.functional.cross_entropy.Hello, I met the same issue, and I didn't change anything you've mentioned about the tokenizer. For me, although I got the same error logs, I found there were differences between my worker 0 and worker 1 nodes.
In worker 0,
RuntimeError: CUDA driver error: invalid argumentandshift_logits = logits[..., :-1, :].contiguous(); while in worker 1, the error occurred whenreturn torch._C._nn.cross_entropy_loss(, andCUDA out of memory.So I changed zero from stage-1 to stage-3 (offload) and it worked.
It seemed that the OOM error didn't caught by the worker 0 and this point was deceptive and resulted in confusion
How do you monitor the problems of different workers? I only see the error of one worker...
Reducing the batch_size works for me.
您的邮件已收到哦我会及时处理哒~
same issue
File "/mnt/petrelfs/qiupengcheng/anaconda3/envs/lsy/lib/python3.10/site-packages/torch/optim/lr_scheduler
.py", line 140, in wrapper
return func.__get__(opt, opt.__class__)(*args, **kwargs)
File "/mnt/petrelfs/qiupengcheng/anaconda3/envs/lsy/lib/python3.10/site-packages/torch/optim/optimizer.py
", line 493, in wrapper
out = func(*args, **kwargs)
File "/mnt/petrelfs/qiupengcheng/anaconda3/envs/lsy/lib/python3.10/site-packages/torch/optim/optimizer.py
", line 91, in _use_grad
ret = func(self, *args, **kwargs)
File "/mnt/petrelfs/qiupengcheng/anaconda3/envs/lsy/lib/python3.10/site-packages/torch/optim/adamw.py", l
ine 243, in step
adamw(
File "/mnt/petrelfs/qiupengcheng/anaconda3/envs/lsy/lib/python3.10/site-packages/torch/optim/optimizer.py
", line 154, in maybe_fallback
return func(*args, **kwargs)
File "/mnt/petrelfs/qiupengcheng/anaconda3/envs/lsy/lib/python3.10/site-packages/torch/optim/adamw.py", l
ine 875, in adamw
func(
File "/mnt/petrelfs/qiupengcheng/anaconda3/envs/lsy/lib/python3.10/site-packages/torch/optim/adamw.py", l
ine 699, in _multi_tensor_adamw
exp_avg_sq_sqrt = torch._foreach_sqrt(device_exp_avg_sqs)
RuntimeError: CUDA driver error: invalid argument
您的邮件已收到哦我会及时处理哒~