LLaMA-Factory Tune QWen-omni with transformers (over 4.52.1) with amounts of WARNING of Qwen2VLVideoProcessor

Reminder

[x] I have read the above rules and searched the existing issues.

System Info

llamafactory=0.9.4.dev0, transformers=4.52.1, torch==2.6.0, full finetune Qwen-omni.

To spport Qwen-omni, the version of transformers are quired to be larger than 4.52.1. Amounts of output: [WARNING|image_processing_qwen2_vl.py:457] 2025-09-17 12:26:05,410 >> Qwen2VLImageProcessor works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to Qwen2VLVideoProcessor.

`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
[WARNING|image_processing_qwen2_vl_fast.py:175] 2025-09-17 12:52:40,014 >> `Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.

临时应对(不可用)： import warnings warnings.filterwarnings("ignore")

Use deepspeed (ds_z3_config.json) get the error of "Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cpu!" from "normalized_filter = 2 * cutoff * kaiser_window * sinc_filter"(transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py). ds_z2_config is ok.

临时应对：使用zero 2

Rapidly increasing memory consumption 临时应对：尝试使用streaming，不成功

accelerator_config:
  dispatch_batches: false
streaming: true
max_steps: 1000
buffer_size: 256

Multi-node operation issues
cutoff_len doesnot work.Increase cutoff_len to solve the issue of the mismatch between visual token and visual feature make no sense.

Others

No response

Sep 17 '25 12:09 Vincent-ZHQ

Issue 1 still seems not resolved? Maybe raised from the mismatch in line 1482: https://github.com/hiyouga/LLaMA-Factory/blob/2c6aded5d4f4ff23aa1887d16972afb3c2543ac3/src/llamafactory/data/mm_plugin.py#L1474-L1482

Oct 21 '25 06:10 rzhao-zhsq

Same issue with

qwen-vl-utils                     0.0.14
transformers                      4.57.1
llamafactory                      0.9.4.dev0

when fine-tuning qwen2.5-vl with video data.

Oct 22 '25 17:10 JingbiaoMei

About issue 1, pip install transformers==4.51.3 works for me.

Oct 31 '25 12:10 zechengtang