ipex-llm NPU inference error

NPU inference error

Open xduzhangjiayu opened this issue 8 months ago • 6 comments

Hi, I am interested in the NPU inference for this project. I tried to run llama on NPU with python\llm\example\NPU\HF-Transformers-AutoModels\Model\llama2\generate.py. I used interface model.save_low_bit and AutoModelForCausalLM.load_low_bit to save and load the converted model, but during the load phase, the error is AttributeError: 'LlamaAttention' object has no attribute 'llama_attention_forward' I was not sure if i do this the converted model is not for NPU?

Any comment or advice is appreciated, thanks !

Jul 03 '24 01:07 xduzhangjiayu

ipex-llm ipex-llm copied to clipboard

NPU inference error

ipex-llm
ipex-llm copied to clipboard