GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
About the fine-grained of weight quantization
Hi, I'm confused about the fine-grained of weight quantization. For example, give a weights W with size of [4096, 4096], and the groupsize is 128. We perform per-channel quantization, hoping to obtain a scale of dimension [4096,] but instead we get a scale of dimension [4096, 32]. It is difficult to decode. Who could help me about this confusion