spox Tracking issue for unsupported parts of the standard

There are some parts of the standard that we don't yet support or we can't test due to their limited usage, but it would be a good idea to write these down in case they come up.

Unticked features written in bold have (unstable and possibly partial) internal implementations and test suites. Let ticked features be the ones that are stable.

Features in this list without their own issue are not significant enough to be a blocker for a release.

[ ] Functions - Limited testing due to limited ORT support
[ ] Initialisers - Replaceable by Constant, initialisers as input defaults unsupported in ORT. #50
[ ] External tensor data - Currently values that become TensorProto must be kept in memory as numpy.ndarray. How will this affect running type inference, when e.g. attributes are expected?
[ ] Sparse tensors - Mostly seem to have a use in initialisers, but currently their appearances for e.g. Constant are ignored.
[ ] Map value type - Though in the standard and undoubtedly would be useful, I'm not sure if it's possible to create/use one.
[ ] Non-standard (non-numpy) data types - though unmentioned (?) in the standard directly, some operators accept dtypes like bfloat16. We would need to slightly modify our representation to accept this.
[ ] Custom naming - Note this isn't strictly required by the standard for producers but it sometimes comes up (e.g. for partial output evaluation). Since Spox tries to keep errors from reaching ONNX checkers it autogenerates all names. It could be useful to allow the user to use their own naming, or access Spox's, if they really wish to.
[ ] Training, differentiability - This seems to be mostly a preview feature.

Feb 09 '23 13:02 JakubBachurskiQC

I can update that I did in fact encounter a Map in the wild, see discussion here. skl2onnx occasionally generates them, even though they are essentially useless, as shape inference in ONNX used to crash whenever they were encountered, which I fixed here. In any case, it might be time to implement them soon.

May 09 '23 10:05 jbachurski

Some thoughts on missing features for future consideration:

Functions: should be easy if we change the approach from using the elaborate class-based Function and instead just focus on allowing inlining independent FunctionProtos. That should be easy enough to implement, and seems to be somewhat relevant as functions are getting new features in ONNX.
External tensor data: would be useful to implement if we want to allow large deep learning models. I think we could have a hybrid approach by allowing either just passing a path (with the value not being explicitly loaded into memory), or marking an attribute as one to be saved externally.
Sparse tensor: just about deciding the representation, really, but it would be nice to just allow passing onnx.SparseTensorProto (which allows storing the values) in the first instance to have this off the list.
Non-numpy data types: float16 and other quantised datatypes are slowly entering the standard. However, before any serious Spox support, we probably need the in-onnx representation to be improved, as this (just pretending it's a float32) won't really work. Perhaps jax-ml/ml_dtypes could help, as it's standalone, and could be an optional Spox dependency?
Custom naming - I think it should be OK to implement a facility of 'weak' names - like Var.label('here'), which would modify the Var in-place to be named 'here' in the model, resolving conflicts by enumerating as usual (here_0, here_1, etc.). I think this could also make debugging Spox internals much easier if we have a facility like this built-in. Left for consideration and a potential issue to open.
Training - I'm not sure if this is applicable to Spox in the first place.

I'm happy to tick some of these off the list later in June - it could help integrating Spox with other ONNX projects.

May 26 '23 17:05 jbachurski