AutoAWQ about the shape of qzeros in awq quantization model

about the shape of qzeros in awq quantization model

Open guojinrong-nn opened this issue 1 year ago • 2 comments

@casper-hansen Hi, I have a question about the awq quantization model on HuggingFace, https://huggingface.co/TheBloke/Llama-2-7B-AWQ/tree/main?show_file_info=model.safetensors.

The shapes of qzero and scales are as follows: model.layers.0.mlp.down_proj.qweight | [11 008, 512] | I32 model.layers.0.mlp.down_proj.qzeros | [86, 512] | I32 model.layers.0.mlp.down_proj.scales | [86, 4096] | F16

Why the second dimension of them is not 1, but 4096? As I understand, a quantization group shares same qzero and scale, and I have checked the 4096 values are not the same one.

Aug 01 '24 11:08 guojinrong-nn

@MuYu-zhi please check out the gemm linear module. All weights are packed in a special way that is related to execution of CUDA kernels.

Aug 01 '24 11:08 casper-hansen

Is there any detailed explanation behind the pack operation, or code fragment in this git that implements the pack operation.

Aug 01 '24 11:08 guojinrong-nn

AutoAWQ AutoAWQ copied to clipboard

about the shape of qzeros in awq quantization model

AutoAWQ
AutoAWQ copied to clipboard