ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

NPU inference sym_int4 model error

Open Gusha-nye opened this issue 8 months ago • 0 comments

Hi, big guys!

  1. When reasoning about Deepseek-7B or Qwen2.5-3B model in NPU, when I choose the parameter load_in_low_bit = “sym_int4”, the response given by the model keeps repeating(example: question: who are you Answer: who are you are you are you are ......) although it doesn't report an error. But when I set the parameter load_in_low_bit = “sym_int8”, the model can reply normally. Is there some problem with the current ipex framework to support models with sym_int4?
  2. I was able to run the Qwen2-vl-2b successfully by referring to the official documentation, but I want to use the Qwen2.5-vl-3b model. When I replace the model, it doesn't run correctly and reports an error about the difference between the old and new bias dimensions! Is Qwen2.5-vl-3b not currently supported?

The link to the official documentation I referenced is shown below: https://github.com/intel/ipex-llm/tree/main/python/llm/example/NPU/HF-Transformers-AutoModels/LLM https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Multimodal/qwen2-vl

I hope to hear back from all of you

Gusha-nye avatar Apr 17 '25 08:04 Gusha-nye