intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
Can't load woq int4 model
I'm trying to evaluate the int4 quantized model with the tools using
from intel_extension_for_transformers.llm.evaluation.lm_eval import evaluate
like what's done in /examples/huggingface/pytorch/text_generation
. I succeeded when I quantized my local Llama13B model like
model=AutoModelForCausalLM.from_pretrained(model_name,quantization_config=woq_config,...)
and put this quantized model to the evaluate function
results=evaluate(... user_model=model,...)
But I also tried to save this quantized model and got a size_mismatch
error when loading it. I used
model.save_pretrained(saved_dir)
user_model=AutoModelForCausalLM.from_pretrained(saved_dir)
and it raised
RuntimeError:Errors(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([16435456]) from checkpoint, the shape in current model is torch.Size([5120,5120]).
...
Can anyone tell me why this issue is occurring?
Hi, i guess this issue may caused by the shape of compressed-weight is not match with the raw-weight-shape.
e.g. model.layers.0.self_attn.q_proj.weight may need 5120*5120*sizeof(float) bytes data, but after woq compress, we only need 16435456 bytes(contain 4bit-weight ,scales and etc), so we create a 1D int8 tensor to hold the compressed weight, but fail to load it because of some shape-safety-checks.
A tmp-solution is using _resize
func to reset the shape of compressed-weight. But this may waste memory(5120*5120*sizeof(int8)-16435456=9778944 bytes).
We will try to find some better solutions, we also welcome some smart ideas proposed by the community:)
Hi,@YangShuaiTHU, Could you tell us what model you used? I test loading/saving in UT:tests/CI/test_weight_only_gpu.py, it is OK. BTW, we used Transformers version is 4.34.1
Hi,@YangShuaiTHU, Could you tell us what model you used? I test loading/saving in UT:tests/CI/test_weight_only_gpu.py, it is OK. BTW, we used Transformers version is 4.34.1
Thanks for your reply! The model is .safetensors from https://huggingface.co/huggyllama/llama-13b, and my Transformers verision is also 4.34.1.
Hi, i guess this issue may caused by the shape of compressed-weight is not match with the raw-weight-shape. e.g. model.layers.0.self_attn.q_proj.weight may need 51205120sizeof(float) bytes data, but after woq compress, we only need 16435456 bytes(contain 4bit-weight ,scales and etc), so we create a 1D int8 tensor to hold the compressed weight, but fail to load it because some shape-safety-check. A tmp-solution is using
_resize
func to reset the shape of compressed-weight. But this may waste memory(51205120sizeof(int8)-16435456=9778944 bytes). We will try to find some better solutions, we also welcome some smart ideas proposed by the community:)
Thank you!
now the issue has been fixed.