marlin
marlin copied to clipboard
can this support lower bit quant?
3-bit ? 2-bit ?
I am also curious about this.
Hi,
currently Marlin supports only a limited set of quantization options (4bit + groupsize 128), selected for a good accuracy/speed trade-off, but therefore at very close to peak efficiency in many cases, including larger batchsizes.
That being said, Marlin can definitively be a good starting point for developing highly efficient kernels for other bitwidths or quantization schemes.
How can one go about making it work for 8bit gptq?