AMDMIGraphX
AMDMIGraphX copied to clipboard
Support microsoft contrib ONNX operator MatMulNBits
Such an operator appears in LLM models quantized to int4 (also with GroupQueryAttention nodes), via the genai tool.
Only N=4 needs to be supported in near term (i.e. 4 bits)
For reference, see the operator description in:
https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulNBits
Here, supports means:
- Ability to parse models that contain it.
- Implement it via known operators.