ngff icon indicating copy to clipboard operation
ngff copied to clipboard

RFC5: clarify delineation, if any, between "root" transformations and "child" transformations

Open clbarnes opened this issue 2 months ago • 4 comments

The byDimension, inverseOf, sequence, and bijection transformations must each contain one or more "child" transformations. Child transformations have some constraints relaxed and some constraints added, in a way which I found confusing in trying to implement the metadata scheme.

I think that it boils down to

  • root transformations MUST have an input/output coordinate system
  • child transformations MAY have a input/output coordinate systems which
    • MUST be internally consistent
    • MUST not conflict with the parent
    • MUST be present if it's a byDimension (why?)
  • any transformation may be a child of any other transformation, except sequences, which MUST NOT be a child of a sequence
    • however, presumably a sequence is allowed as a grandchild of a sequence, e.g. if a sequence contained a byDimension and that contained a sequence

All in all, I'm not sure that the sequence-of-sequence restrictions is worth enforcing; it adds complexity for validation and, if anything, complicates implementations (see original comment here; pinging @will-moore for the counterpoint). There are other cases where nesting transformations is not the most efficient or clear way of describing a transformation (inverseOf(inverseOf(t)), bijection(t, inverseOf(t))), but enumerating all such combinations in the spec and then having every reader validate against each case adds unnecessary complexity. A blanket statement of

writer APIs SHOULD encourage clear, concise descriptions of transformations

allows writers to make ergonomic choices relevant to their domain, while readers can be permissive and just re-use the same parsing logic.

tl;dr proposals/ questions

  1. change the sequence-of-sequences MUST NOT into a SHOULD NOT, improving reader consistency
  2. why are input/output systems required for children of a byDimension? The sub-system is already implied by the parent's input/output and the child's inputAxes/ outputAxes. This improves reader and writer consistency
  3. to go the other way, could child transformations have a "SHOULD NOT contain input/output, and readers MAY choose not to validate". The value of determining the coordinate system halfway through a sequence or byDimension seems slim, and for the others it's just an extra opportunity for a footgun.

clbarnes avatar Nov 03 '25 17:11 clbarnes

The way that I implemented it, I found it very easy to handle the case of a sequence transform containing just 1 level of nesting - see https://github.com/will-moore/napari-ome-zarr/pull/2/files#diff-1c9c253d036df88195dd4242fc3afde18a2f409c5e98e622b9421a823ac6dac2R109

It wouldn't be much harder to implement a recursive function to handle n-levels of nesting, so I'm not going to object if that's what people want.

will-moore avatar Nov 03 '25 18:11 will-moore

My two cents:

[input and output fields] MUST be present if it's a byDimension (why?)

I also didn't immediately understand this, and at https://github.com/ome/ngff/issues/358 I have a proposal to improve (IMO) byDimension

I agree that although having nested sequences is more complex, from an implementation view it's easy enough to handle nested sequences (I can find the code for this in ome-zarr-models if anyone's interested).

child transformations have a "SHOULD NOT contain input/output, and readers MAY choose not to validate"

If people are writing down input/output, I think it's reasonable to make sure that the values are consistent, otherwise it might indicate a mistake in the metadata they wrote. Either way, a statement that could result in different readers disagreeing on what's valid/invalid metadata seems like a bad idea and a source of potential confusion.

dstansby avatar Nov 04 '25 09:11 dstansby

If people are writing down input/output, I think it's reasonable to make sure that the values are consistent

Yes, that's fair. In that case, I'd err on the side of disallowing input/output fields in child transformations; maybe I'm missing the value of it but it seems to just add opportunities for invalid data. Then we can make a clear statement: root transformations MUST have input/output; child transformations MUST NOT have input/output, without having to enumerate "except in the case of X or Y, unless Z..." in half a dozen places.

Good serialisation frameworks should be able to handle this without much code duplication: in OO languages, the root form of a transformation would just subclass the child form (RootTransformation could even be a mixin), and in languages favouring composition you could just delegate the transformation interface from the root form to the child form.

clbarnes avatar Nov 05 '25 10:11 clbarnes

👍 sounds like a good way forward to me

dstansby avatar Nov 05 '25 10:11 dstansby