The sft command is shown below

accelerate launch --num_machines 1 --num_processes 4 --main_process_port 29502 --multi_gpu /root/work/filestorage/gaoshan/projects/OpenThinkIMG/r1_v/open_r1/sft.py --output_dir /root/work/filestorage/gaoshan/projects/OpenThinkIMG/SFT --model_name_or_path /root/work/filestorage/gaoshan/models/Qwen2.5-VL-7B-Instruct --dataset_name /root/work/filestorage/gaoshan/dataset/OpenThinkIMG/OpenThinkIMG-Chart-SFT-2942/openthinkIMG_chart_SFT.json --seed 42 --learning_rate 2e-5 --deepspeed /root/work/filestorage/gaoshan/projects/OpenThinkIMG/zero3.json --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --logging_steps 1 --gradient_checkpointing true --attn_implementation flash_attention_2 --bf16 True --num_train_epochs 2 --run_name qwen2vl_sft_v1111 --save_steps 100 --warmup_ratio 0.1 --save_only_model true

the data json file example is: { "id": "000000", "image": "/root/work/filestorage/gaoshan/dataset/OpenThinkIMG/OpenThinkIMG-Chart-SFT-2942/openthinkIMG_chart_SFT_img/image_0.png", "conversations": [ { "from": "human", "value": "You are a visual assistant capable of generating and solving steps for chart-based reasoning. Your goal is to answer chart-related questions. You can rely on your own capabilities or use external tools to assist in solving. The available actions include: OCR, Point, DrawHorizontalLineByY, DrawVerticalLineByX, ZoomInSubfigure, SegmentRegionAroundPoint, and Terminate.\nYour output should be in a strict JSON format as follows:\n{\"thought\": \"the reasoning process\", \"actions\": [{\"name\": \"action\", \"arguments\": {\"argument1\": \"value1\", \"argument2\": \"value2\"}}]}\n\n<image>\n\nQuestion: Look at Female Householders with related children under 18, Find the red bar, its value 20%, What is that bar ?" }, { "from": "gpt", "value": "{\"thought\": \"To identify the label of the red bar representing 20% in the \\\"Female Householders with related children under 18\\\" category, I need to reference the legend in the chart that correlates colors with education categories. The red color needs to be matched with its corresponding label.\", \"actions\": []}" }, { "from": "human", "value": "Look at Female Householders with related children under 18, Find the red bar, its value 20%, What is that bar ?" }, ..... }

But I occured the following error, looking forward your reply. [rank3]: ValueError: The dataset appears to be vision-related (contains 'image' or 'images' keys), but the provided model does not seem to be a vision-language model. Please check your model and dataset.

Nov 11 '25 11:11 GaoXiaoshan

看下了报错，应该是图片没有正确加入进去，再debug看看 self._is_vision_dataset = "image" in dataset_sample or "images" in dataset_sample if self._is_vision_dataset and not self._is_vlm: raise ValueError( "The dataset appears to be vision-related (contains 'image' or 'images' keys), but the provided " "model does not seem to be a vision-language model. Please check your model and dataset." )

Nov 12 '25 09:11 zzzzzzyc

谢谢回复，这个问题解决了。不过我推理的时候还是遇到多进程的问题，在tf_eval.evaluator下，我的推理代码会卡在res_log = dataset.evaluate() 这里，没有任何报错最后 NCCL 连接超时中断。请问您遇到过这个问题么，方便的话能否微信沟通🙏🙏🙏 dataset = BaseEvalDataset( load_data_function=load_data_function, getitem_function=self.model.getitem_fn, evaluate_function=evaluate_function, task_config = task_config, task_args = self.task_args, model_args = self.model_args, )

pdb.set_trace()

self.inferencer.batch_inference(dataset) logger.info(f"batch inference complete")

breakpoint()

res_log = dataset.evaluate()

zzzzzzyc @.***> 于2025年11月12日周三 17:55写道：

zzzzzzyc left a comment (zhaochen0110/OpenThinkIMG#27) https://github.com/zhaochen0110/OpenThinkIMG/issues/27#issuecomment-3521062028

看下了报错，应该是图片没有正确加入进去，再debug看看 self._is_vision_dataset = "image" in dataset_sample or "images" in dataset_sample if self._is_vision_dataset and not self._is_vlm: raise ValueError( "The dataset appears to be vision-related (contains 'image' or 'images' keys), but the provided " "model does not seem to be a vision-language model. Please check your model and dataset." )

— Reply to this email directly, view it on GitHub https://github.com/zhaochen0110/OpenThinkIMG/issues/27#issuecomment-3521062028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TA3KU4ATZ7PXZEASMXVL34L7Y7AVCNFSM6AAAAACLYLTBTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMRRGA3DEMBSHA . You are receiving this because you authored the thread.Message ID: @.***>

Nov 17 '25 02:11 GaoXiaoshan

另外在issue中，作者建议RL阶段不使用deepspeed（issue 12）。希望能有帮助。

Gao Shan @.***> 于2025年11月17日周一 10:48写道：

谢谢回复，这个问题解决了。不过我推理的时候还是遇到多进程的问题，在tf_eval.evaluator下，我的推理代码会卡在res_log = dataset.evaluate() 这里，没有任何报错最后 NCCL 连接超时中断。请问您遇到过这个问题么，方便的话能否微信沟通🙏🙏🙏 dataset = BaseEvalDataset( load_data_function=load_data_function, getitem_function=self.model.getitem_fn, evaluate_function=evaluate_function, task_config = task_config, task_args = self.task_args, model_args = self.model_args, )

pdb.set_trace()

self.inferencer.batch_inference(dataset) logger.info(f"batch inference complete")

breakpoint()

res_log = dataset.evaluate()

zzzzzzyc @.***> 于2025年11月12日周三 17:55写道：

zzzzzzyc left a comment (zhaochen0110/OpenThinkIMG#27) https://github.com/zhaochen0110/OpenThinkIMG/issues/27#issuecomment-3521062028

看下了报错，应该是图片没有正确加入进去，再debug看看 self._is_vision_dataset = "image" in dataset_sample or "images" in dataset_sample if self._is_vision_dataset and not self._is_vlm: raise ValueError( "The dataset appears to be vision-related (contains 'image' or 'images' keys), but the provided " "model does not seem to be a vision-language model. Please check your model and dataset." )

— Reply to this email directly, view it on GitHub https://github.com/zhaochen0110/OpenThinkIMG/issues/27#issuecomment-3521062028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TA3KU4ATZ7PXZEASMXVL34L7Y7AVCNFSM6AAAAACLYLTBTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMRRGA3DEMBSHA . You are receiving this because you authored the thread.Message ID: @.***>

Nov 17 '25 03:11 GaoXiaoshan

抱歉，我还没碰到这个问题，多进程的问题我建议先试试单卡能否跑通，只让rank0运行 batch_inference()呢？NCCL 连接超时中断的原因有很多，我也不确定是到哪一步有问题

Nov 17 '25 05:11 zzzzzzyc

不过这个trl+vllm的框架在训练grpo时多进程是必须的。因为vllm推理太占用显存了，所以设定是单独一张卡进行推理使用，其他卡进行gpu训练。另外，我想问一下您能跑通作者的release model的测试么，我会卡在后面进行data.evaluator()中？方便的话可以微信沟通🙏🙏🙏

zzzzzzyc @.***> 于2025年11月17日周一 13:55写道：

zzzzzzyc left a comment (zhaochen0110/OpenThinkIMG#27) https://github.com/zhaochen0110/OpenThinkIMG/issues/27#issuecomment-3540097477

抱歉，我还没碰到这个问题，多进程的问题我建议先试试单卡能否跑通，只让rank0运行 batch_inference()呢？NCCL 连接超时中断的原因有很多，我也不确定是到哪一步有问题

— Reply to this email directly, view it on GitHub https://github.com/zhaochen0110/OpenThinkIMG/issues/27#issuecomment-3540097477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ3TA3PJEQ3NH4SYW7YAL3335FPLJAVCNFSM6AAAAACLYLTBTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNBQGA4TONBXG4 . You are receiving this because you authored the thread.Message ID: @.***>

Nov 17 '25 07:11 GaoXiaoshan

During SFT occurs bug [rank3]: ValueError: The dataset appears to be vision-related (contains 'image' or 'images' keys), but the provided model does not seem to be a vision-language model. Please check your model and dataset.

pdb.set_trace()

breakpoint()

pdb.set_trace()

breakpoint()