intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

After fine-tuning qwen2-1.5B-instruction and quantifying its AWQ, an error occurred while using Intel Extension for Transformers and CPU for inference. But when I used the same method to fine tune and quantify qwen1.5-4B chat before, I could use Intel Extension for Transformers to accelerate CPU inference. 对qwen2-1.5B-instruct微调并且awq量化后,使用intel-extension-for-transformers和CPU进行推理时出错。但我之前使用同样的方式微调和量化的qwen1.5-4B-chat时是可以使用intel-extension-for-transformers加速CPU推理的。

Open Autism-al opened this issue 1 year ago • 1 comments
trafficstars

model.cpp: loading model from runtime_outs/ne_qwen2_q_autoround.bin The number of ne_parameters is wrong. init: n_vocab = 151936 init: n_embd = 1536 init: n_mult = 8960 init: n_head = 12 init: n_head_kv = 0 init: n_layer = 28 init: n_rot = 128 init: ftype = 0 init: max_seq_len= 32768 init: n_ff = 8960 init: n_parts = 1 MODEL_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/./models/qwen/qwen.h:48: false /tmp/tmp9b4073w1: line 3: 55575 Aborted python /home/lmf/llm/Qwen2-finetuning/awq_intel_extension.py ERROR conda.cli.main_run:execute(124): conda run python /home/lmf/llm/Qwen2-finetuning/awq_intel_extension.py failed. (See above for error)

Autism-al avatar Sep 11 '24 03:09 Autism-al