Tune QWen-omni with transformers (over 4.52.1) with amounts of WARNING of Qwen2VLVideoProcessor
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
llamafactory=0.9.4.dev0, transformers=4.52.1, torch==2.6.0, full finetune Qwen-omni.
- To spport Qwen-omni, the version of transformers are quired to be larger than 4.52.1. Amounts of output: [WARNING|image_processing_qwen2_vl.py:457] 2025-09-17 12:26:05,410 >>
Qwen2VLImageProcessorworks only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded toQwen2VLVideoProcessor.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
[WARNING|image_processing_qwen2_vl_fast.py:175] 2025-09-17 12:52:40,014 >> `Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`.
临时应对(不可用): import warnings warnings.filterwarnings("ignore")
- Use deepspeed (ds_z3_config.json) get the error of "Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cpu!" from "normalized_filter = 2 * cutoff * kaiser_window * sinc_filter"(transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py). ds_z2_config is ok.
临时应对:使用zero 2
- Rapidly increasing memory consumption
临时应对:尝试使用streaming,不成功
accelerator_config:
dispatch_batches: false
streaming: true
max_steps: 1000
buffer_size: 256
-
Multi-node operation issues
-
cutoff_len doesnot work.Increase cutoff_len to solve the issue of the mismatch between visual token and visual feature make no sense.
Others
No response
Issue 1 still seems not resolved? Maybe raised from the mismatch in line 1482: https://github.com/hiyouga/LLaMA-Factory/blob/2c6aded5d4f4ff23aa1887d16972afb3c2543ac3/src/llamafactory/data/mm_plugin.py#L1474-L1482
Same issue with
qwen-vl-utils 0.0.14
transformers 4.57.1
llamafactory 0.9.4.dev0
when fine-tuning qwen2.5-vl with video data.
About issue 1, pip install transformers==4.51.3 works for me.