onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[quant] supports act_order inputs in Matmulnbits and new quantization algorithm "hqq"

Open wejoncy opened this issue 1 year ago • 0 comments

Description

  1. Support quantized GPTQ weight in huggingface like TheBloke/Llama-2-7B-Chat-GPTQ
  2. Support Act_order for GPTQ
  3. Support HQQ algorithm to quantize matmul weight and add quant script

Motivation and Context

wejoncy avatar Jan 12 '24 07:01 wejoncy