GPTQ-for-LLaMa 6-bit quantization

6-bit quantization

Open philipturner opened this issue 2 years ago • 1 comments

For smaller models, quantization causes more quality loss than large models. Could the repository try 6-bit / 128 groups for stuff like LLaMa-7B? This could be most useful for some of the smaller language networks in Stable Diffusion.

May 18 '23 12:05 philipturner

Yes.. 6b would work great for 13b and below to make the model smarter.

May 21 '23 14:05 Ph0rk0z

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

6-bit quantization

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard