AutoAWQ icon indicating copy to clipboard operation
AutoAWQ copied to clipboard

Support 3-bit and 2-bit quantization with the FLUTE kernel.

Open radi-cho opened this issue 1 year ago • 4 comments

Hi,

I would like to propose to add the FLUTE kernel as a backend version for fast 3-bit and 2-bit quantization. I think we can use a FluteLinear module with its corresponding 3-bit and 2-bit packing as a new linear implementation. Then it could be substituted into the rest of the AutoAWQ codebase. If the maintainers believe this could be a valuable addition, I will volunteer to open a pull request.

cc @HanGuo97

radi-cho avatar Aug 01 '24 07:08 radi-cho