qonnx Overhaul shape inference for custom ops

Details

Currently, QONNX custom ops need to implement the make_shape_compatible_op function for shape inference. This function is supposed to return a single, standard ONNX node which has the desired shape inference behavior as the custom op. Finding a single-node shape inference equivalent can be challenging if the custom op has non-trivial shape inference behavior. Several custom ops overcome this by assuming that their input shape is available, computing the desired output shape, and using the make_const_shape_op.

However, this requires that the shapes for the inputs to all the custom op are already specified. In cases where they are not, bugs arise e.g. https://github.com/fastmachinelearning/qonnx/pull/152 works around one such bug. We should have a more flexible custom op shape inference system.

New behavior

There are several paths forward which should be evaluated in more detail.

Enhance the custom op shape inference system such that custom ops can return multi-node subgraphs instead of just single-node ones. The ONNX abstractions for subgraphs or onnx.compose may come in handy here. The system in InferShapes for replacing custom nodes with single standard nodes will also need an overhaul, replacing a single node with a subgraph and then back again.
Keep the current make_shape_compatible_op single-node interface, and keep the assumptions about input shape being available. Rework InferShapes to replace custom ops one at a time in topological order and calling ONNX shape inference at each step, instead of replacing all at the same time before calling ONNX shape inference.
Explore possibilities with PyOp in onnxruntime-extensions to switch out a larger portion of how QONNX handles custom ops. This goes beyond just rehauling shape inference, but may have other benefits. See https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/pytorch_custom_ops_tutorial.ipynb

Motivation

Avoid shape inference related bugs for custom ops.

Parts of QONNX being affected

Depends on the approach chosen.

Dec 15 '24 21:12 maltanar

Hey, I've been trying out the possibility of using PyOp in onnxruntime-extensions and I've managed to convert the following ops:

Quant,
BipolarQuant,
Trunc
XnorPopCount

One problem I currently encountered is that, PyOp does not support attributes that are lists. Or at least I did not figure out how to do it. This is problem for the Im2Col op, which has plenty of attributes that are lists. One solution would be to simply retype these ops to string. Other then that, its been pretty smooth using the PyOps.

Here is a small example of how onnx_op works: https://github.com/jurevreca12/onnxruntime-extensions-test and here is my branch where i converted the ops: https://github.com/jurevreca12/qonnx/tree/onnxruntime-extensions

Jan 20 '25 20:01 jurevreca12

I have some follow up comments on this issue.

Firstly, I can confirm that onnxruntime-extensions currently only supports float32, int64 and string attributes. And I am starting to realize this will be more of an issue for cases when attributes are list of floats (e.g. scale factors when quantizing at a high-granularity) . I managed to add some other ops like Im2Col, by translating the list attributes to string and then back. This works fine, however it feels hacky. For attributes that are list of floats, however, this will likely not work well, depending on the float serialization.

So to persue further option 3, I see these options:

Use some kind of protobuf serialization (convert list to google.protobuf.pyext._message.RepeatedScalarContainer) to convert list attributes to prototext and then back in the op implementations. This will look ugly when inspecting a saved qonnx graph.
Request a feature (or implement a PR) to extend support in onnxruntime-extensions. This would involve writting some c++ code, so might be somewhat involved. There is also an issue about this in the onnxruntime-extension repo, however it seems that the maintainers are not active (https://github.com/microsoft/onnxruntime-extensions/issues/785).
Instead of using attributes move to using ONNX Initializers. It seems that these do support arrays. But this would impact other projects, like Brevitas, which would have to adapt the exporter. Also I am not really sure why ONNX has both Attributes and Initailizers and if this would have any other negative impacts.

Jan 21 '25 15:01 jurevreca12

Thanks for sharing the results of your investigation @jurevreca12 ! Another emerging option is actually using functions in ONNX instead of the custom PyOp from onnxruntime-extensions. I'm planning to look at this over the summer.

Jun 17 '25 15:06 maltanar