AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Support microsoft contrib ONNX operator MatMulNBits

Open hgaspar opened this issue 6 months ago • 0 comments

Such an operator appears in LLM models quantized to int4 (also with GroupQueryAttention nodes), via the genai tool.

Only N=4 needs to be supported in near term (i.e. 4 bits)

For reference, see the operator description in:

https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulNBits

Here, supports means:

  1. Ability to parse models that contain it.
  2. Implement it via known operators.

hgaspar avatar Aug 19 '24 18:08 hgaspar