Qwen/Qwen2.5-VL-7B-Instruct PPO 训练报错
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank0]: launch()
[rank0]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank0]: run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank0]: ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank0]: batch = next(dataiter)
[rank0]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank0]: current_batch = next(dataloader_iter)
[rank0]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank0]: data = self._next_data()
[rank0]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank0]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank0]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank0]: return self.collate_fn(data)
[rank0]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank0]: features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank0]: KeyError: 'labels'
[rank2]: Traceback (most recent call last):
[rank2]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank2]: launch()
[rank2]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank2]: run_exp()
[rank2]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank2]: _training_function(config={"args": args, "callbacks": callbacks})
[rank2]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank2]: run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank2]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank2]: ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank2]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank2]: batch = next(dataiter)
[rank2]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank2]: current_batch = next(dataloader_iter)
[rank2]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank2]: data = self._next_data()
[rank2]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank2]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank2]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank2]: return self.collate_fn(data)
[rank2]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank2]: features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank2]: KeyError: 'labels'
[rank3]: Traceback (most recent call last):
[rank3]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank3]: launch()
[rank3]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank3]: run_exp()
[rank3]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank3]: _training_function(config={"args": args, "callbacks": callbacks})
[rank3]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank3]: run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank3]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank3]: ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank3]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank3]: batch = next(dataiter)
[rank3]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank3]: current_batch = next(dataloader_iter)
[rank3]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank3]: data = self._next_data()
[rank3]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank3]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank3]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank3]: return self.collate_fn(data)
[rank3]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank3]: features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank3]: KeyError: 'labels'
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank1]: launch()
[rank1]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank1]: run_exp()
[rank1]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank1]: _training_function(config={"args": args, "callbacks": callbacks})
[rank1]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank1]: run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank1]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank1]: ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank1]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank1]: batch = next(dataiter)
[rank1]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank1]: current_batch = next(dataloader_iter)
[rank1]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank1]: data = self._next_data()
[rank1]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank1]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank1]: File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank1]: return self.collate_fn(data)
[rank1]: File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank1]: features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank1]: KeyError: 'labels'
[rank0]:[W305 11:06:18.525879493 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
W0305 11:06:19.600000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1855708 closing signal SIGTERM
W0305 11:06:19.601000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1855709 closing signal SIGTERM
W0305 11:06:19.601000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1855711 closing signal SIGTERM
E0305 11:06:20.066000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 2 (pid: 1855710) of binary: /root/anaconda3/envs/LLaMA-Factory-0.9.1/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
run(args)
File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
elastic_launch(
File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-03-05_11:06:19
host : wxhs-10.30.100.202
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 1855710)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Reproduction
Put your message here.
Others
启动命令
llamafactory-cli train \
--stage ppo \
--do_train True \
--model_name_or_path Qwen/Qwen2.5-VL-7B-Instruct \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--template qwen2_vl \
--flash_attn auto \
--dataset_dir data \
--dataset post_score_train_data_v2 \
--cutoff_len 2048 \
--learning_rate 5e-05 \
--num_train_epochs 3.0 \
--max_samples 100000 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 1000 \
--warmup_steps 0 \
--packing False \
--report_to none \
--output_dir saves/Qwen2.5-VL-7B-Instruct/lora/train_2025-03-05-11-02-07 \
--bf16 True \
--plot_loss True \
--trust_remote_code True \
--ddp_timeout 180000000 \
--include_num_input_tokens_seen True \
--optim adamw_torch \
--lora_rank 128 \
--lora_alpha 256 \
--lora_dropout 0 \
--lora_target all \
--reward_model saves/Qwen2.5-VL-7B-Instruct/lora/train_2025-03-04-17-48-04 \
--reward_model_type lora \
--ppo_score_norm True \
--ppo_whiten_rewards True \
--top_k 0 \
--top_p 0.9
dataset格式为
{
"instruction": "### **图片质量评估与分类**\n\n#### **任务说明** \n请根据以下评分标准和判断因素,对图片质量进行综合评估,并将其分类为 **1 到 5 级**。\n\n---\n\n### **评分标准**\n\n- **1 级(极差)**:图片质量极差,色彩失真、构图混乱、模糊不清,无观赏价值。\n- **2 级(较差)**:图片质量较差,通常为随手拍摄>或手机截图,存在明显瑕疵(如画面杂乱、色彩不协调、清晰度不足或构图欠佳),观赏价值较低。\n- **3 级(一般)**:图片质量一般,整体表现中规中矩,虽有一定亮点但存在明显缺陷,观赏价值有限。\n- **4 级(良好)**:图片质量较好,各方面表现较为均衡,细节清晰、构图合理,具有较高的观赏价值。\n- **5 级(优秀)**:图片质量出众>,无论是色彩、构图、清晰度还是创意,都表现卓越,观赏价值极高。\n\n---\n\n### **判断因素**\n\n在评估图片时,请综合考虑以下 5 个因素,并根据图片在各方面的表现进行描述性评价:\n\n1. **色彩搭配**:图片颜色是否鲜艳、和谐,是否能够吸引观众的眼球。\n2. **构图**:图片的主体是否突出,构图是否合理,背景是否干净整洁。\n3. **清晰度**:图片是否清晰、细节是否明确,有无模糊或失焦现象。\n4. **创意**:图片是否具有独特视角或创意,是否能给人留下深刻印象。\n5. **情感表达**:图片是否能传递情感或讲述故事,是否能引起观众共鸣。\n\n---\n\n### **评估步骤**\n\n1. **分别对每个因素进行评价**,可以给出 1-5 分的评分,也可以采用文字描述说明各因素的优缺点
。\n2. **结合各因素的表现,综合判断图片的整体质量**,给出最终的评价等级(1-5 级)。\n - 可以采用简单的平均思路,但更重要的是结合各项表现给出合理的综合判断。\n3. **详细说明评估依据**,描述图片在各个判断因素上的表现以及最终评级的理由。\n\n---\n\n### **输出格式**\n\n请以以下 **JSON 结构** 输出评估结果:\n\n```json\n{\n \"Thoughts\": \"<对图片各方面表现及最终评级依据的详细解释>\",\n \"Category\": \"<1 / 2 / 3 / 4 / 5>\"\n}\n```\n\n---\n\n### **示例**\n\n假设某图片的评估情况如下: \n- **色彩搭配**:色彩较为鲜明且和谐,但缺乏亮点。 \n- **构图**:构图合理,但主体稍显模糊,背景有些杂乱。 \n- **清晰度**:整体较为清晰,但部分
区域存在轻微模糊。 \n- **创意**:创意一般,没有特别独到的视角。 \n- **情感表达**:情感传递不够强烈,缺乏感染力。\n\n综合考虑各方面表现,最终认为该图片整体质量属于**3级(一般)**。\n\n**输出示例**:\n\n```json\n{\n \"Thoughts\": \"该图片色彩较为和谐,但整体缺乏亮点;构图尚可,但主体不够突出且背景略显杂乱;清晰度
一般,局部存在模糊;创意及情感表达方面表现平平,未能激起强烈共鸣。\",\n \"Category\": \"3\"\n}\n```\n\n---\n\n请按照以上说明,对图片进行综合评估与分类。",
"input": "<image>",
"output": "2",
"images": [
"images/post_score_image/Epk44nchif_resize.webp?imginfo=w3904,h2928"
]
}
"post_score_train_data_v2":{
"file_name": "post_score_image_v2/train_data.json",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"images": "images"
}
},
I am also able to reproduce with qwen2.5-vl-7b, using vllm==0.7.3
We recommend using EasyR1 for RL, the RL implementation in LlamaFactory is temporarily bugged: https://github.com/hiyouga/EasyR1
We recommend using EasyR1 for RL, the RL implementation in LlamaFactory is temporarily bugged: https://github.com/hiyouga/EasyR1
请问还会跟进这个问题吗?
我们在尝试对Qwen2-vl进行ppo时遇到了相同问题,使用的reward model是自己单独训练的。 之前已经尝试过直接进行dpo训练,没有问题。
EasyR1暂时不支持lora还是比较头疼的
碰到了同样的问题
碰到了同样的问题
llama_factory_vl/lib/python3.10/site-packages/trl/trainer/ppo_trainer.py中的第416行dataset = self._remove_unused_columns(dataset)会将labels标签舍弃,一个简单粗暴的方法是改成dataset = dataset