oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

rfcs: OCP MX dynamic quantization support

Open mgouicem opened this issue 3 months ago • 3 comments

This is a proposal to support MXFP datatype. In particular, to support dynamic quantization of outputs.

Link for rendered version.

mgouicem avatar Sep 12 '25 08:09 mgouicem

@Sqvid @theComputeKid

mgouicem avatar Sep 12 '25 14:09 mgouicem

Hi all, just pushed an update to the RFC. In a nutshell:

  • added link to POC PR) for option 1.b (extend set_scales)
  • updated recommendation to option 1.b (extend set_scales)

The main driver to now recommend extending set scales are:

  • it allows to unify scales handling both for external API and internally, making different ways of handling scales explicitly mutually exclusive
  • it should be more robust for extending to new quantization flavors (e.g. static quantization with floating-point zero-points, or dynamic quantization with division by scale before conversion).

Let me know if there are preferences or other opinions on this. Thanks.

mgouicem avatar Sep 22 '25 13:09 mgouicem

I don't think there are any major comments on our end. Out of interest, here is a link to some similar work that has gone into the TOSA specification. Any future work regarding TOSA and oneDNN interacting would be aided by similar numerical models.

https://git.mlplatform.org/tosa/specification.git/commit/?id=063846a75b9687ab01e58cb3538472bffb3a03b0

Sqvid avatar Oct 10 '25 10:10 Sqvid