ngff RFC5: Simplifying references

RFC5 defines metadata objects that contain references to other metadata entities. A key example is how coordinateTransformation objects can have input and output fields which are the names of coordinateSystem objects. Associating each transformation object with a pair of references to other objects means parsers must perform a lot of checks. If we consider a scale transformation with the following metadata defined in the metadata of a Zarr group:

{'type': 'scale', 'scale': [1,2,3], 'input': A, 'output': B}

then a validator has to check the following things:

are A and B identical?
are A and A the names of coordinateSystem objects?
is A the path to a Zarr array?
is A the name of a coordinate system defined in a Zarr array?
is the transform inside the metadata of a Zarr group that contains multiscale groups?
- are A and B the names of multiscale groups?
is the transform contained in a sequence / inverse / byDimension / bijection transform?
- If the transform is contained in a sequence, is it the first and / or last element?
Is the transform inside a multiscales.datasets JSON object?
- is A identical to the 'path' field of the multiscales dictionary?
is the transform inside the coordinateTransformations field of a multiscales JSON object?
- Is A identical to the name of the intrinsic coordinate system?

It's possible that I missed something, or conveyed redundant checks. I think it's clear that the semantics of the input and output fields are pretty complicated, and unfortunately context-dependent -- the same coordinateTransform object might be valid in one place, but invalid in another, due to the input field alone.

Although coordinate transformations model functions, the above behavior is very unlike how functions typically work in programs or in mathematics. Functions in those domains are reusable, but coordinateTransformations are not, since they are "branded" with the names of input and output coordinateSystem objects.

I think it would be simpler if coordinateTransformations objects were only concerned 1 thing: defining a f(x) -> y mapping, where x and y are both tuples of numbers. That means, instead of {'type': 'scale', 'scale': [1, 2, 3], 'input': A, 'output': B}, we would have {'type': 'scale', 'scale': [1, 2, 3]}. This is closer to how functions are defined in programming languages, and also closer to how ome-zarr 0.4 - 0.5 work today. It also makes defining a sequence transform simpler -- instead of worrying about composing the sequence transform's input and output fields with the input and output fields of its content, you can just define a sequence as an array of other transforms. Very simple.

But input and output contain important information. We must put that information somewhere else in metadata. Here is my proposal: we make coordinateSystems a JSON object with keys that form the names of output spaces, and values that declare the names of the input space, the transform objects, and the axes of the output space. The whole thing would look like this:

{
    "coordinateTransformations": { 
        "affine": {
            "type": "affine", "params": [[0,0,0], [0,0,0], [0,0,0]]
        },
        "nonlinear_tx": {
            "type": "weird_warp",
            "params": "path_to_warp_field"
            },
        "sequence": ["affine", "nonlinear_tx"] // these names are resolved in the `coordinateTransformations` keys
    },
    "coordinateSystems": {
        "default": { // This is the name of the output coordinate system. maybe we require a coordinateSystem named `'default'`?
            "input": null, // same as leaving input unset 
            "transforms": ["affine"],
             "axes": [...],
            },
        "inline_scale": {
            "input": "default", 
            "transforms": [{"type": "scale", "params": [1,1,1]}] // inline transforms are allowed instead of references
            "axes": [...]
        }, 
        "atlas_indirect": {
            "input": "default",  // this name are resolved in the `coordinateSystems` keys
            "transforms": ["nonlinear_tx"], // these names are resolved in the `coordinateTransformations` keys
            "axes": [...]
        },
        "atlas_direct": {
            "input": null, 
            "transforms": ["sequence"],
            "axes": [...]
        }
    }
}

I think this can convey everything that RFC5 conveys, but IMO it's much simpler. We get simplicity by separating the definition of the coordinate transformations (as plain functions in the coordinateTransformations JSON object) from their application in the coordinateSystems object. This mirrors how functions work in programs: we define them once, and then use (and potentially re-use) them in a separate context where we assign semantics to their inputs and outputs. Maybe it's too late to make these kind of changes to RFC5, but I figured it was worth writing this up in any case :)

Nov 03 '25 20:11 d-v-b

btw let me know if this should not be a github issue but rather an rfc comment -- I was copying the model of the other recent RFC5-themed issues in the issue tracker.

Nov 03 '25 21:11 d-v-b

One downside of separating the input/output and the transform itself is that to understand a transformation between two systems, you have to look in two places in the metadata instead of one. Perhaps this is a trade off worth making if transform definitions are likely to be re-used many times, but in most (all?) cases in the examples (https://github.com/dstansby/rfc5-example-transform-graphs) I don't think I've come across a case where two transforms with the same parameters but different input/output are used.

Nov 04 '25 09:11 dstansby

One downside of separating the input/output and the transform itself is that to understand a transformation between two systems, you have to look in two places in the metadata instead of one.

This is true, but in my proposal references to transforms can always be substituted with actual transforms (see the 'inline_scale' example, which has a scale transformation defined inline).

Perhaps this is a trade off worth making if transform definitions are likely to be re-used many times

I've worked on image registration problems where I iteratively added increasingly complex transformations, but I also wanted to check the result of each "stage" individually. That means, for a final sequence of transforms ['scale', 'affine', 'warp'], you might want to view the same image under 3 different coordinate systems: one using ['scale'], another using ['scale', 'affine'], and a finally ['scale', 'affine', 'warp']. I imagine this is pretty common when debugging image registration?

Nov 04 '25 09:11 d-v-b

also, the current text of RFC5 requires looking in 3 places to understand a transform: besides the transform itself, you have to resolve the input reference, and the output reference. In the current text, if you start at a particular coordinateSystem, I don't think there's a requirement that it is ever used by transforms? So in principle you have to do an exhaustive search over a lot of metadata to figure out where / if there's a transform that maps points to that coordinate system. In my approach, it's not possible for a coordinate system to go unused.

Nov 04 '25 10:11 d-v-b

another important example of transform re-use: a transformation that applies to all the scale levels of a multiscale image. Right now I think we handle that by adding an extra key to the multiscales JSON object, but we could avoid this extra key by making transform objects simply reusable.

Nov 04 '25 10:11 d-v-b

I think that this proposal has some nice features such as using dictionaries instead of lists for coordinateTransformations and coordinateSystems. And it does make transforms a bit more granular.

It makescoordinateTransformations simpler but coordinateSystems more complex, so I'm not sure that the overall graph is very much simpler.

You can get some reusage of transforms in the existing RFC5 spec. For your ['scale', 'affine', 'warp'] example you can have:

coordinateTransforms: [
  {"scale": [1,1,1], "input": "default", "output": 'system2"},
  {"affine": [[1,1],[1,1]], "input": "system2", "output": 'system3"}
  {"warp": "path", "input": "system3", "output": 'system4"}
],
coordinateSystems: [
  {"name": "default", "axes": {}}
  {"name": "system2", "axes": {}}
  {"name": "system3", "axes": {}}
  {"name": "system4", "axes": {}}
]

So, system4 is the result of ['scale', 'affine', 'warp'], system3 is the result of ['scale', 'affine'] etc.

If we were starting from scratch I would likely vote for this proposal, but I'm not yet convinced it's a big-enough improvement to make the change from current RFC5.

Nov 04 '25 10:11 will-moore

On the topic of resolving what the values in input and output mean, over the last few weeks myself and @will-moore and @jo-mueller discussed this quite a bit, and this was settled on (ref: current RFC5):

Most coordinate transformations MUST specify their input and output coordinate systems using input and output with a string value that MUST correspond to the name of a coordinate system or the path to a multiscales group

This allows one to reference a coordinate system name (within the same OME-Zarr group) or a multiscales group, while still being relatively simple. So parsing and interpreting these fields isn't that complex at the moment.

On the topic of input/output in sequences, it's possible to just omit input/output when transforms are in sequences. On the implementation side this doesn't add much complexity - you can just ignore all the input/output fields in transforms in a sequence, and users don't even have to specify them. This does get a bit sticky when considering what to do when the user does specify input/output that isn't consistent. In https://github.com/ome/ngff/issues/359#issuecomment-3484947126 I said that it's reasonable to validate input/output if the user writes them down inside a sequence, but I can see a world where RFC5 instead mandates that implementations MUST ignore any input/output fields in sequence transforms.

On the topic of the proposed new structure;

we make coordinateSystems a JSON object with keys that form the names of output spaces, and values that declare the names of the input space,

This does not make sense to me - a coordinate system exists independently of any other coordinate systems. And transforms link those coordinate systems. In graph language, I think of coordinate systems as nodes and coordinate transforms as directed edges between those nodes. RFC5 as it stands does a good job of modelling these two objects.

On other points:

... iteratively added increasingly complex transformations,

This is not an example of re-using the same transform function multiple times.

if you start at a particular coordinateSystem, I don't think there's a requirement that it is ever used by transforms?

This is true I think - I raised it already at https://github.com/ome/ngff/issues/357.

another important example of transform re-use: a transformation that applies to all the scale levels of a multiscale image.

You can do this currently be introducing an intermediate transform at the multiscales level. For two scale levels, something like

scale0 > intermediate
scale1 > intermediate
intermediate > default

where each arrow is a transformation, and "intermediate" and "default" are named coordinate systems for the multiscale group.

Nov 04 '25 14:11 dstansby

This does not make sense to me - a coordinate system exists independently of any other coordinate systems. And transforms link those coordinate systems. In graph language, I think of coordinate systems as nodes and coordinate transforms as directed edges between those nodes. RFC5 as it stands does a good job of modelling these two objects.

A coordinate system is just a named set of axes, right? I don't have a strong opinion about whether a named set of axes "exists" independently of any other named set of axes. But in the graph formalism, with a graph like a -> b -> c arriving at c node does depend on passing through b. You could also use a single transform that skips b. So for me this isn't a conceptual problem.

Maybe that's actually a useful framing: in my proposal, or some variant of this proposal, you can take a coordinate system and quickly ask "how do I send points to this coordinate system"? By constrast, in the current text of RFC5, a single coordinate system tells you nothing about how it's used. Instead, you take a single transform and quickly ask "what coordinate spaces is this transform used in?". But because transforms can contain other transforms, answering this question can be pretty complicated or even undefined.

Nov 04 '25 14:11 d-v-b

... iteratively added increasingly complex transformations,

This is not an example of re-using the same transform function multiple times.

This is what I had in mind. The transforms get re-used.


{
    "coordinateTransformations": {
        "scale":  {"type": "scale", "params": [1,2,3]},
        "affine": {
            "type": "affine", "params": [[0,0,0], [0,0,0], [0,0,0]]
        },
        "nonlinear_tx": {
            "type": "weird_warp",
            "params": "path_to_warp_field"
            },
    },
    "coordinateSystems": {
        "default": { 
            "input": null, // same as leaving input unset 
            "transforms": [], // array indices -> array indices. no need for identity.
             "axes": [...],
            },
        "scale": {
            "input": "default", 
            "transforms": ["scale"], 
            "axes": [...]
        }, 
        "scale + affine": {
            "input": "default",
            "transforms": ["scale", "affine"],
            "axes": [...]
        },
        "full": {
            "input": "default", 
            "transforms": ["scale", "affine", "nonlinear_tx"],
            "axes": [...]
        }
    }
}

Nov 04 '25 14:11 d-v-b

Late to the party, just 2 cents from my side:

Maybe it's too late to make these kind of changes to RFC5, but I figured it was worth writing this up in any case :)

Noted and appreciated! I definitely see value in writing this down. In case usage of rfc5 in its current form reveals unforeseen plot holes, it's nice to have this written out (i.e. as comment?) as something we can come back to later, instead of having to browse a ton gh issues on the search for that issue.

To be honest, at this time I see little room for adoption into rfc5 - primarily due to the timing. Maybe a comment would be the right place.

A coordinate system is just a named set of axes, right? I don't have a strong opinion about whether a named set of axes "exists" independently of any other named set of axes. But in the graph formalism, with a graph like a -> b -> c arriving at c node does depend on passing through b. You could also use a single transform that skips b. So for me this isn't a conceptual problem.

True. The fact that, in the present state of RFC5, coordinateSystems can be defined without being referenced by any transformation (see #357) is definitely something we want to add to the spec.

But because transforms can contain other transforms, answering this question can be pretty complicated or even undefined

I'm not sure I'm 100% convinced of this statement. Yes, there's the sequence, but the spec explicitly doesn't allow it to contain other sequences, so we don't have to deal with overly complicated nested transforms.

Aside from that, I'm not sure that the proposition above has a solution for how to link coordinate systems from different images. I.e., for registration problems I think one would still have to allow images as inputs to find a good tradeoff between keeping the multiscales metadata concise while having enough expressive freedom on the root level to describe connections between images.

I don't think that rfc5 is already at the end of the road, but I think it's a fair tradeoff between

clarity of expression,
expressive freedom of transforms (i.e., describing the chaining of multiple coordinate transformations from one coordinate system to another is well possible, with exactly that usecase in mind)
metadata checks to perform when reading

When I was working on the schemas (current state: https://github.com/ome/ngff-spec/pull/17), I also found it a bit unsatisfactory that many requirements are hard if not impossible to express in schema language, but judging from the proposal above I don't see that we'll be able to navigate around that entirely.

Technicalities aside, I think the proposal above may also arrive at a similar maturity as rfc5 in the present form and the only reason keeping it from doing so is the fact that rfc5 has already progressed very far through the approval process. We definitely need a better forum to have these wide-eyed discussions earlier in the process.

Nov 04 '25 14:11 jo-mueller

I'm not sure I'm 100% convinced of this statement. Yes, there's the sequence, but the spec explicitly doesn't allow it to contain other sequences, so we don't have to deal with overly complicated nested transforms.

Can't a sequence contain another wrapper transform (like inversof) that itself contains a sequence? You can of course plug this hole by disallowing sequence transforms inside any wrapper transforms, but you are paying with more complexity, more special cases, etc.

Ignoring things like singular matrices, there's basically 1 thing that can go wrong with a transform in the proposal I have here: the dimensionality of the array doesn't match the parameters of the transform (e.g., 1D array, 2D transform). This is really easy to check.

In RFC5, there are many different ways the input and output parameters can be problematic, each one requires a different code path, and many of them are context-sensitive. The context-sensitivity is the biggest problem, IMO -- statements like "for object foo, the x field is required unless foo is contained inside a bar..." are simple to write in the spec but generate a lot of unnecessary code in the implementations. That being said, I'm not holding my breath for any big changes to RFC5.

Nov 04 '25 15:11 d-v-b

Can't a sequence contain another wrapper transform (like inversof) that itself contains a sequence? You can of course plug this hole by disallowing sequence transforms inside any wrapper transforms, but you are paying with more complexity, more special cases, etc.

Well, sequences are not allowed to contain another sequence. Maybe it's a bit vague in the wording but it could be argued that this applies to sequences nested in another transformation, too.

Nov 04 '25 15:11 jo-mueller

Aside from that, I'm not sure that the proposition above has a solution for how to link coordinate systems from different images. I.e., for registration problems I think one would still have to allow images as inputs to find a good tradeoff between keeping the multiscales metadata concise while having enough expressive freedom on the root level to describe connections between images.

I don't have something solid, but I was thinking of using ':' delimited strings to denote coordinate systems from other images. Something like 'path/to/node:name_of_coordinate_system'. If you disallow the ':' symbol from the names of coordinate systems, this should totally unambiguous and resolvable.

Nov 04 '25 17:11 d-v-b

Would adopting that solution to the current rfc5 somewhat resolve your concerns? I was thinking about this with @dstansby (i.e., prepending paths with /) but iirc I was a bit weary of adding that exclusively for transformations. If introduced, I think such a rule should really apply to the whole spec.

Nov 05 '25 08:11 jo-mueller

before changing anything about how references are declared and resolved I would probably add a section to RFC5 that explains how they are currently intended to work, because I think this info kind of scattered throughout the spec right now. A few key questions to answer:

are references to other zarr groups / arrays paths relative to the current node, absolute paths, or URLs?
- if they are relative, are references to parent nodes allowed via "../../"?
- If they are absolute, what is the root of the path?
- If they are URLs, e.g. 's3://bucket/other-container.zarr/transform', then the ':' symbol can't be used as a delimiter.

IMO the simplest thing is to only allow relative paths descending from the currrent node, (so no "../../"). But I don't know if this is consistent with the current spec.

Nov 05 '25 08:11 d-v-b

A major use case for sharing transformations between multiscale datasets is when you have an image volume and labels for that volume. You'd usually define the image multiscale first, with sufficient transformations to move it into world coordinates, then generate the label multiscale, and it may be convenient to guarantee that they use the same transformations (or at least the same coordinate systems) rather than replicate the config - certainly you wouldn't want to replicate a coordinate or displacement volume for each set of labels. That said, if (fully-compliant) viewers have to account for overlaying volumes with different transformations and resolutions, then maybe replicating the config is simpler than dealing with a whole reference system.

But given we allow paths in a few places, we should definitely make it clear where the starting point of that path is. Maybe this should go into another issue, but I think that absolute paths are tricky because not all intermediate groups need to have a metadata document, which means that to find the root of a node at s3://bucket/prefixA/prefixB/prefixC/prefixD/prefixE/groupPrefix, the only way to find the root is to look for bucket/zarr.json, bucket/prefixA/zarr.json, bucket/prefixA/prefixB/zarr.json etc., which is not necessarily what the requester even wants. In my previous lab, we had N5 datasets scattered across a bunch of different network storage and then mounted and symlinked them underneath a single root on a central server for ease of access, which was very convenient for us.

At the very least we can distinguish names, relative paths, and absolute paths clear by requiring a ./ prefix for relative paths, / for absolute paths (if we are to support that at all), and no such prefix for a name which lives in a registry somewhere.

Nov 06 '25 12:11 clbarnes

are there any situations where child nodes can make references to parent nodes, or sibling nodes? If not, then pure relative paths can be used exclusively. IMO this would be a massive simplification.

Nov 06 '25 12:11 d-v-b

are there any situations where child nodes can make references to parent nodes, or sibling nodes? If not, then pure relative paths can be used exclusively. IMO this would be a massive simplification.

I don't think this is the case in RFC5. In other collection-formats of sorts (i.e., wells, hcs, etc) I don't remember reading about restrictions regarding relative paths. There, paths are always relative paths that point down, but never into the parent folder. The only example are labels, whose path is ../../ but I think this will be simplified by RFC8 already.

Nov 06 '25 12:11 jo-mueller

I don't think this is the case in RFC5.

Then maybe these two statements need to be clarified? They state that scale and translation can be zarr arrays at an arbitrary location in the container.

I would replace these statements with a reference to a central section that explains how references work.

Nov 06 '25 12:11 d-v-b

I would replace these statements with a reference to a central section that explains how references work.

Should this section also clarify how referencing other json objects works? Or would this section basically specify:

Paths are always relative
Paths always point down (with the sole exception of labels)
Paths are always prepended with /

Nov 06 '25 13:11 jo-mueller

I would think the rules for references to other JSON objects in the same metadata document depend a lot more on the details of the particular metadata object in question, whereas rules about references to other Zarr nodes should probably be defined for all metadata objects, but that's just my feeling.

Also I'm not sure you want the paths to be prepended with /, because that makes them absolute and not relative

Nov 06 '25 13:11 d-v-b

I agree that if we're going to use UNIXy conventions for ancestor groups ../../, we should try to avoid using UNIXy conventions for things other than their UNIXy meaning (i.e. don't use a / prefix for relative paths).

Nov 06 '25 14:11 clbarnes