onnxruntime
onnxruntime copied to clipboard
[quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq"
Description
- Support quantized GPTQ weight in huggingface like TheBloke/Llama-2-7B-Chat-GPTQ
- Support Act_order for GPTQ
- Support HQQ algorithm to quantize matmul weight and add quant script