Transfromers and other issues during V-ToolRL training

Open yxizhong opened this issue 5 months ago • 0 comments

Thank you for the open source! I am trying to train a model using the framework and the SFT phase seems to be working, but the RL phase makes a lot of mistakes. I ran the tool_grpo.py file according to the parameters of V-ToolRL: Reinforcement Learning with Vision Tools in the README.md:

torchrun --nproc_per_node=2 \
        --nnodes="1" \
        --node_rank="0" \
        --master_addr="$MASTER_ADDR" \
        --master_port=29503 \
        r1_v/open_r1/tool_grpo.py --use_vllm False \
        --output_dir ***/Qwen2-VL-8b-instruct_rl_20250710 \
        --model_name_or_path ***/Qwen2-VL-8b-instruct_20250708 \
        --dataset_name ***/OpenThinkIMG/data/openthinkimg/rl/output_data.json \
        --max_prompt_length 16000 \
        --max_completion_length 2048 \
        --temperature 1.0 \
        --seed 42 \
        --learning_rate 1e-6 \
        --num_generations 8 \
        --lr_scheduler_type "constant" \
        --vllm_gpu_memory_utilization 0.8 \
        --deepspeed ***/OpenThinkIMG/configs/ds_z3_offload_config.json \
        --per_device_train_batch_size 4 \
        --gradient_accumulation_steps 12 \
        --logging_steps 1 \
        --bf16 true \
        --report_to wandb \
        --gradient_checkpointing true \
        --attn_implementation flash_attention_2 \
        --max_pixels 200000 \
        --num_train_epochs 1 \
        --run_name OTK-RL \
        --save_steps 100 \
        --save_only_model true \
        --controller_addr *** \
        --use_tool true \

When use_tool is set to true, an error keeps appearing:

2025-07-10 18:23:55 | ERROR | stderr | [rank0]:   File "/mnt/petrelfs/yangxizhong/miniconda3/envs/tool-server/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1028, in get_rope_index
2025-07-10 18:23:55 | ERROR | stderr | [rank0]:     input_ids = input_ids[attention_mask[i].to(input_ids.device) == 1]
2025-07-10 18:23:55 | ERROR | stderr | [rank0]: IndexError: The shape of the mask [803] at index 0 does not match the shape of the indexed tensor [1] at index 0

debug found that the while loop of the _sample function in transformers.utils truncates the cache of input_ids, but the attention mask does not. This causes latitude mismatch in input_ids = input_ids[attention_mask[i].to(input_ids.device) == 1] operation in get_rope_index in modeling_qwen2_vl.py. I tried to change it:

if attention_mask[i].shape[0] ! = input_ids.shape[0]:
min_length = min(attention_mask[i].shape[0], input_ids.shape[0])
input_ids = input_ids[:min_length]
attention_mask_aligned = attention_mask[i][:min_length].to(input_ids.device)
else:
attention_mask_aligned = attention_mask[i].to(input_ids.device)

However, new problems will arise, such as:

2025-07-14 19:59:55 | ERROR | stderr | [rank0]:    File  "/ MNT/petrelfs/yangxizhong/miniconda3 / envs/tool - server/lib/python3.10 / site - packages/transformers/models/qwen2_vl/modelin g_qwen2_vl.py", line 875, in forward
2025-07-14 19:59:55 | ERROR | stderr | [rank0]:     "full_attention": create_causal_mask(**mask_kwargs),
2025-07-14 19:59:55 | ERROR | stderr | [rank0]:    File  "/ MNT/petrelfs yangxizhong/miniconda3 / envs/tool - server/lib/python3.10 / site - packages/transformers/masking_utils py", line 753, in create_causal_mask
2025-07-14 19:59:55 | ERROR | stderr | [rank0]:      early_exit, attention_mask, packed_sequence_mask, kv_length, kv_offset = _preprocess_mask_arguments(
2025-07-14 19:59:55 | ERROR | stderr | [rank0]:    File  "/ MNT/petrelfs yangxizhong/miniconda3 / envs/tool - server/lib/python3.10 / site - packages/transformers/masking_utils py", line 704, in _preprocess_mask_arguments
2025-07-14 19:59:55 | ERROR | stderr | [rank0]:     position_ids = position_ids.expand(batch_size, -1)
2025-07-14 19:59:55 | ERROR | stderr | [rank0]: RuntimeError: expand(torch.cuda.LongTensor{[3, 8, 856]}, size=[8, -1]):  the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (3)

I also tried switching transformers to 4.47.0 provided in tool_server_requirements.txt, but there are a lot of other import issues and so on. Currently I use 4.53.2 for trasnformers

I would like to ask the author whether it is possible to run V-ToolRL directly according to the running mode given in the README.md. Are there other actions that modify the python library?

Jul 15 '25 01:07 yxizhong