GPTQ-for-LLaMa Total parameters are less after quantization

Total parameters are less after quantization

Open ZN1010 opened this issue 10 months ago • 1 comments

After quantization of LLaMA2-7b, I notice that total parameters of the quantized model is around 1.1B while the original dense model has around 6.7B parameters. It seems that the code also prunes LLM weights. Any idea why weights are additionally removed?

Thanks a lot!

Dec 21 '24 05:12 ZN1010

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

Total parameters are less after quantization

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard