AutoAWQ
AutoAWQ copied to clipboard
Support 3-bit and 2-bit quantization with the FLUTE kernel.
Hi,
I would like to propose to add the FLUTE kernel as a backend version for fast 3-bit and 2-bit quantization. I think we can use a FluteLinear module with its corresponding 3-bit and 2-bit packing as a new linear implementation. Then it could be substituted into the rest of the AutoAWQ codebase. If the maintainers believe this could be a valuable addition, I will volunteer to open a pull request.
cc @HanGuo97