GPTQ-for-LLaMa About the fine-grained of weight quantization

About the fine-grained of weight quantization

Open xingyueye opened this issue 2 years ago • 0 comments

Hi, I'm confused about the fine-grained of weight quantization. For example, give a weights W with size of [4096, 4096], and the groupsize is 128. We perform per-channel quantization, hoping to obtain a scale of dimension [4096,] but instead we get a scale of dimension [4096, 32]. It is difficult to decode. Who could help me about this confusion

May 10 '23 10:05 xingyueye

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

About the fine-grained of weight quantization

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard