rfcs: OCP MX dynamic quantization support
This is a proposal to support MXFP datatype. In particular, to support dynamic quantization of outputs.
Link for rendered version.
@Sqvid @theComputeKid
Hi all, just pushed an update to the RFC. In a nutshell:
- added link to POC PR) for option 1.b (extend set_scales)
- updated recommendation to option 1.b (extend set_scales)
The main driver to now recommend extending set scales are:
- it allows to unify scales handling both for external API and internally, making different ways of handling scales explicitly mutually exclusive
- it should be more robust for extending to new quantization flavors (e.g. static quantization with floating-point zero-points, or dynamic quantization with division by scale before conversion).
Let me know if there are preferences or other opinions on this. Thanks.
I don't think there are any major comments on our end. Out of interest, here is a link to some similar work that has gone into the TOSA specification. Any future work regarding TOSA and oneDNN interacting would be aided by similar numerical models.
https://git.mlplatform.org/tosa/specification.git/commit/?id=063846a75b9687ab01e58cb3538472bffb3a03b0