can not run Qwen2.5 omni with latest transformer
**I can run Qwen2.5 omni with pip install git+https://github.com/huggingface/transformers@f742a644ca32e65758c3adb36225aef1731bd2a8, and I can run it successful. But I want to use the multi batch function, and the https://github.com/huggingface/transformers@f742a644ca32e65758c3adb36225aef1731bd2a8 does not contain thin funtion, So I have to update the transformer version to 4.52.3 and later.
And It occur error like this:**
Traceback (most recent call last):
File "C:\Work\Omni\resource\cptest\test_7B\quan.py", line 77, in
And my inference code like this:
import soundfile as sf from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor from qwen_omni_utils import process_mm_info from ipex_llm import optimize_model
model_path = "C:\Work\Omni\resource\Qwen2.5-Omni-3B" model = Qwen2_5OmniForConditionalGeneration.from_pretrained(model_path, enable_audio_output=False)
model = optimize_model(model, low_bit='sym_int4', optimize_llm=True, modules_to_not_convert=["audio_tower", "visual", "token2wav"]) model = model.half().to('xpu')
processor = Qwen2_5OmniProcessor.from_pretrained("C:\Work\Omni\resource\Qwen2.5-Omni-3B")
conversation1 = [ { "role": "system", "content": [{"type": "text", "text": "You are Qwen..."}], }, { "role": "user", "content": [ {"type": "text", "text": "describe the audio"}, {"type": "audio", "audio": "C:\Work\Omni\resource\cptest\test_7B\test20.wav"}, ], }, ]
conversation2 = [ { "role": "system", "content": [ {"type": "text", "text": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."} ], }, { "role": "user", "content": [ {"type": "audio", "audio": "C:\Work\Omni\resource\cptest\test_7B\test48.wav"}, ] } ]
conversation = [conversation1, conversation2]
USE_AUDIO_IN_VIDEO = True text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False) audios, images, videos = process_mm_info(conversation, use_audio_in_video=USE_AUDIO_IN_VIDEO)
inputs = processor( text=text, audio=audios, images=images, videos=videos, return_tensors="pt", padding=True, use_audio_in_video=USE_AUDIO_IN_VIDEO )
inputs = inputs.to(model.device).to(model.dtype)
print("\n===== 模型输入1 =====") print(f"input_ids 形状: {inputs.input_ids.shape}") if hasattr(inputs, "audio_features"): print(f"audio_features 形状: {inputs.audio_features.shape}") else: print("audio_features 不存在")
text_ids = model.generate(**inputs, use_audio_in_video=USE_AUDIO_IN_VIDEO, return_audio=False) text = processor.batch_decode(text_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False) print("========================") print(text) print("========================")
I found that if I set optimize_llm=False,the inference will success, I dont know why this happen
I found that if I set optimize_llm=False,the inference will success, I dont know why this happen
Hi @ZhangWei125521
If optimize_llm=False is fine, that means we have a transformer version mismatch. According to your info, 4.52.3 is not well supported.
Can you try transformers=4.37 (default version recommended by IPEX-LLM)?
I found that if I set optimize_llm=False,the inference will success, I dont know why this happen
Hi @ZhangWei125521 If
optimize_llm=Falseis fine, that means we have a transformer version mismatch. According to your info,4.52.3is not well supported.Can you try
transformers=4.37(default version recommended by IPEX-LLM)?
Hi @qiyuangong thanks for you reply, but I aim at to Qwen2.5 omni multi batch inference with IPEX-LLM, and the transformers=4.37 not support Qwen2.5 omni.
I found that if I set optimize_llm=False,the inference will success, I dont know why this happen
Hi @ZhangWei125521 If
optimize_llm=Falseis fine, that means we have a transformer version mismatch. According to your info,4.52.3is not well supported. Can you trytransformers=4.37(default version recommended by IPEX-LLM)?Hi @qiyuangong thanks for you reply, but I aim at to Qwen2.5 omni multi batch inference with IPEX-LLM, and the
transformers=4.37not support Qwen2.5 omni.
OK. In that case, you can only use optimize_llm=False with the higher version transformers (multi-batch feature). That means IPEX-LLM will only optimize the linear layer for your model.
Hi @ZhangWei125521 ,
We have supported Qwen2.5 omni models with llm-scaler-vllm recently. You can now test this functionality using the Docker image intel/llm-scaler-vllm:0.2.0-b2.
For implementation details and guidance, please refer to our Omni Model documentation.