MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

[BUG] <title>llama factory sft训练报错

Open hm1229 opened this issue 1 year ago • 3 comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • [x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • [x] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

我遵循了https://github.com/OpenBMB/MiniCPM-o/blob/main/docs/llamafactory_train_and_infer.md给出的lora sft.yaml,并按照https://github.com/OpenBMB/MiniCPM-o/issues/807给出的环境进行了安装,在lora-sft时发生了下列报错,怀疑是accelerate库版本的问题? /data3/utils/LLaMA-Factory/src/llamafactory/data/mm_plugin.py:669: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). audio_feature_lens = [torch.tensor(audio_feature_len) for audio_feature_len in audio_feature_lens] Traceback (most recent call last): File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 156, in send_to_device return tensor.to(device, non_blocking=non_blocking) TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data1/anaconda3/envs/minitrain/bin/llamafactory-cli", line 8, in sys.exit(main()) File "/data3/utils/LLaMA-Factory/src/llamafactory/cli.py", line 112, in main run_exp() File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp _training_function(config={"args": args, "callbacks": callbacks}) File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data3/utils/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train return inner_training_loop( File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2345, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/data_loader.py", line 561, in iter current_batch = send_to_device(current_batch, self.device, non_blocking=self._non_blocking) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 184, in send_to_device { File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 185, in k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 158, in send_to_device return tensor.to(device) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in to self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None} File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None} AttributeError: 'list' object has no attribute 'to' Traceback (most recent call last): File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 156, in send_to_device return tensor.to(device, non_blocking=non_blocking) TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data1/anaconda3/envs/minitrain/bin/llamafactory-cli", line 8, in sys.exit(main()) File "/data3/utils/LLaMA-Factory/src/llamafactory/cli.py", line 112, in main run_exp() File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp _training_function(config={"args": args, "callbacks": callbacks}) File "/data3/utils/LLaMA-Factory/src/llamafactory/train/tuner.py", line 67, in _training_function run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data3/utils/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train return inner_training_loop( File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/trainer.py", line 2345, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/data_loader.py", line 561, in iter current_batch = send_to_device(current_batch, self.device, non_blocking=self._non_blocking) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 184, in send_to_device { File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 185, in k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/accelerate/utils/operations.py", line 158, in send_to_device return tensor.to(device) File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in to self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None} File "/data1/anaconda3/envs/minitrain/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 816, in self.data = {k: v.to(device=device) for k, v in self.data.items() if v is not None} AttributeError: 'list' object has no attribute 'to'

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

所有环境如下piplist.txt

运行环境 | Environment

- OS:Ubuntu 20.04
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

hm1229 avatar Feb 27 '25 17:02 hm1229

same issue

4daJKong avatar Mar 17 '25 07:03 4daJKong

same issue

shuaijiang avatar Mar 17 '25 11:03 shuaijiang

I have solved this problem by updating transformers version,pip install transformers==4.48.3 if your python version is 3.10.

4daJKong avatar Mar 19 '25 06:03 4daJKong