NPU inference sym_int4 model error

Open Gusha-nye opened this issue 8 months ago • 0 comments

Hi, big guys!

When reasoning about Deepseek-7B or Qwen2.5-3B model in NPU, when I choose the parameter load_in_low_bit = “sym_int4”, the response given by the model keeps repeating（example: question: who are you Answer: who are you are you are you are ......） although it doesn't report an error. But when I set the parameter load_in_low_bit = “sym_int8”, the model can reply normally. Is there some problem with the current ipex framework to support models with sym_int4?
I was able to run the Qwen2-vl-2b successfully by referring to the official documentation, but I want to use the Qwen2.5-vl-3b model. When I replace the model, it doesn't run correctly and reports an error about the difference between the old and new bias dimensions! Is Qwen2.5-vl-3b not currently supported?

The link to the official documentation I referenced is shown below： https://github.com/intel/ipex-llm/tree/main/python/llm/example/NPU/HF-Transformers-AutoModels/LLM https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Multimodal/qwen2-vl

I hope to hear back from all of you

Apr 17 '25 08:04 Gusha-nye