GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
Total parameters are less after quantization
After quantization of LLaMA2-7b, I notice that total parameters of the quantized model is around 1.1B while the original dense model has around 6.7B parameters. It seems that the code also prunes LLM weights. Any idea why weights are additionally removed?
Thanks a lot!