ipex-llm
ipex-llm copied to clipboard
NPU inference error
Hi,
I am interested in the NPU inference for this project.
I tried to run llama on NPU with python\llm\example\NPU\HF-Transformers-AutoModels\Model\llama2\generate.py.
I used interface model.save_low_bit
and AutoModelForCausalLM.load_low_bit
to save and load the converted model, but during the load phase, the error is
AttributeError: 'LlamaAttention' object has no attribute 'llama_attention_forward'
I was not sure if i do this the converted model is not for NPU?
Any comment or advice is appreciated, thanks !