can this support lower bit quant?

Open vince62s opened this issue 2 years ago • 3 comments

3-bit ? 2-bit ?

Jan 30 '24 08:01 vince62s

I am also curious about this.

Feb 19 '24 09:02 ChenMnZ

Hi,

currently Marlin supports only a limited set of quantization options (4bit + groupsize 128), selected for a good accuracy/speed trade-off, but therefore at very close to peak efficiency in many cases, including larger batchsizes.

That being said, Marlin can definitively be a good starting point for developing highly efficient kernels for other bitwidths or quantization schemes.

Feb 20 '24 09:02 efrantar

How can one go about making it work for 8bit gptq?

Mar 02 '24 18:03 nivibilla