Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
运行命令:
deepspeed --num_gpus=8 src/train_bash.py --stage sft --model_name_or_path /train/Llama-3-70B --do_train --dataset thp --finetuning_type full --output_dir llama3_0419 --overwrite_cache --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --gradient_accumulation_steps 8 --preprocessing_num_workers 16 --lr_scheduler_type cosine --logging_steps 10 --save_steps 10 --eval_steps 10 --val_size 1000 --learning_rate 5e-6 --max_grad_norm 0.5 --num_train_epochs 3.0 --evaluation_strategy steps --load_best_model_at_end --plot_loss --bf16 --template default --deepspeed deepspeed_3.json
[INFO|trainer.py:2057] 2024-04-20 09:12:27,424 >> Number of trainable parameters = 70,553,706,496
0%| | 0/123 [00:00<?, ?it/s]Traceback (most recent call last):
File "/train_new/github/LLaMA-Factory/src/train_bash.py", line 14, in
main()
File "/train_new/github/LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "/train_new/github/LLaMA-Factory/src/llmtuner/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/train_new/github/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 71, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/accelerator.py", line 2007, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 175, in backward
self.engine.step()
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2169, in step
self._take_model_step(lr_kwargs)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2075, in take_model_step
self.optimizer.step()
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2047, in step
self.unscale_and_clip_grads(sub_group_id, scaled_global_grad_norm)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2117, in unscale_and_clip_grads
self.fp32_partitioned_groups_flat[sub_group_id].grad.mul(1. / combined_scale)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Expected behavior
No response
System Info
No response
Others
No response
deepspeed版本0.14.0
https://github.com/hiyouga/LLaMA-Factory/issues/2493#issuecomment-1950971296