LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

Tune QWen-omni with transformers (over 4.52.1) with amounts of WARNING of Qwen2VLVideoProcessor

Open Vincent-ZHQ opened this issue 4 months ago • 3 comments

Reminder

  • [x] I have read the above rules and searched the existing issues.

System Info

llamafactory=0.9.4.dev0, transformers=4.52.1, torch==2.6.0, full finetune Qwen-omni.

  1. To spport Qwen-omni, the version of transformers are quired to be larger than 4.52.1. Amounts of output: [WARNING|image_processing_qwen2_vl.py:457] 2025-09-17 12:26:05,410 >> Qwen2VLImageProcessor works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to Qwen2VLVideoProcessor.
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
[WARNING|image_processing_qwen2_vl_fast.py:175] 2025-09-17 12:52:40,014 >> `Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 
`Qwen2VLImageProcessorFast` works only with image inputs and doesn't process videos anymore. This is a deprecated behavior and will be removed in v5.0. Your videos should be forwarded to `Qwen2VLVideoProcessor`. 

临时应对(不可用): import warnings warnings.filterwarnings("ignore")

  1. Use deepspeed (ds_z3_config.json) get the error of "Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cpu!" from "normalized_filter = 2 * cutoff * kaiser_window * sinc_filter"(transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py). ds_z2_config is ok.

临时应对:使用zero 2

  1. Rapidly increasing memory consumption Image 临时应对:尝试使用streaming,不成功
accelerator_config:
  dispatch_batches: false
streaming: true
max_steps: 1000
buffer_size: 256
  1. Multi-node operation issues

  2. cutoff_len doesnot work.Increase cutoff_len to solve the issue of the mismatch between visual token and visual feature make no sense.

Others

No response

Vincent-ZHQ avatar Sep 17 '25 12:09 Vincent-ZHQ

Issue 1 still seems not resolved? Maybe raised from the mismatch in line 1482: https://github.com/hiyouga/LLaMA-Factory/blob/2c6aded5d4f4ff23aa1887d16972afb3c2543ac3/src/llamafactory/data/mm_plugin.py#L1474-L1482

rzhao-zhsq avatar Oct 21 '25 06:10 rzhao-zhsq

Same issue with

qwen-vl-utils                     0.0.14
transformers                      4.57.1
llamafactory                      0.9.4.dev0 

when fine-tuning qwen2.5-vl with video data.

JingbiaoMei avatar Oct 22 '25 17:10 JingbiaoMei

About issue 1, pip install transformers==4.51.3 works for me.

zechengtang avatar Oct 31 '25 12:10 zechengtang