Do you have any plans to support GPTQ-4bit model?

Open forestsource opened this issue 2 years ago • 2 comments

It seems that GPTQ-4bit model is already supported in this project. https://github.com/qwopqwop200/GPTQ-for-LLaMa

Mar 17 '23 06:03 forestsource

I'd also like to know how to do this. It seems the primary bottleneck is how fast the layers can be fed to the GPU. My copy load is at 80% while GPU load is at 10%. Is there a way to improve this somehow? I assume if we can get the layers quantized down to 1/4th the size it would be almost 4x faster.

Mar 17 '23 08:03 Danielv123

It seems that GPTQ-4bit model is already supported in this project. https://github.com/qwopqwop200/GPTQ-for-LLaMa

this is meant for the bare weights

Mar 17 '23 16:03 breadbrowser