Allow `input` and `output` transformations (RFC-5) to be `null` (or any optional field!)
Proposal
We propose to allow the input and output fields of coordinate transformations to have a value of null, with the same meaning that they are unset (originally proposed by @dstansby here: https://github.com/ome-zarr-models/ome-zarr-models-py/pull/187#discussion_r2094632751.).
Or more generally, to have this meaning for all the fields that are not required from the specs. Note that for certain fields, like name of multiscale objects, (AFAIK) the specs do not forbid the use of null, reason why ome-zarr-models-py is already allowing to format them with null.
Motivation
In ome-zarr-models-py we define fields that are not required as optional and with default value of None (e.g. name: str | None = None). When serializing these fields to JSON, the None values get represented as null, so allowing null values in the specs, with the same semantic as having them omitted, would keep Pydantic implementations simple.
Alternatives (not ideal)
An alternative, if null is not allowed in the specs, would be to explore the use of exclude_none when calling model_dump_json(), but this would require the user to know about the need for this parameter, or would require us introducing an helped function to wrap around model_dump_json(). An alternative is to use patterns like name: str | None = Field(default=PydanticUndefined) (PydanticUndefined cab be imported from pydantic_core), but this use is not advertised in the pydantic docs.
CC @d-v-b
Hi @LucaMarconato , thanks for the proposal. So far, null values are hardly used in the spec. Right now, only the axis allows the following:
The "axes" MUST contain 2 or 3 entries of "type:space" and MAY contain one additional entry of "type:time" and MAY contain one additional entry of "type:channel" or a null / custom type.
I don't think I know a strong reason to keep nulls out of the metadata, except for two points:
Or more generally, to have this meaning for all the fields that are not required from the specs.
Would using null as a placeholder for all recommended but not required field have consequences for the validation? I.e., would the validator encounter null and skip the warning it would usually issue when encountering a missing recommended, but not required field? Would/should validation fail if a null placeholder for a recommended field were missing?
Lastly - I agree that from a Python point of view, popping None values out of a dictionary can be a bit of pain but how would developers of other languages see this? I.e., what would a Java/whatever reader do when encountering a null value? Is that easy to parse and handle?
Would using
nullas a placeholder for all recommended but not required field have consequences for the validation? I.e., would the validator encounternulland skip the warning it would usually issue when encountering a missing recommended, but not required field? Would/should validation fail if anullplaceholder for a recommended field were missing?
I don't see why there would be any validation problems if the spec declares that {"foo": null, "bar" 10} has the same meaning as {"bar": 10}.
Reasons to allow omitting fields that are null:
- Reduce the size of their
zarr.jsonobjects - readability improvement for JSON objects with tens or hundreds of fields that generally go unset.
Reasons to disallow {"foo" : null} as a synonym for {}:
nullhas some special meaning in the context of the key"foo", which is semantically different from the"foo"key being unset. I don't think we have any cases like this in ome-zarr.
And I don't see why this would be problematic for other languages. If they support JSON, then they support the value null already.
Hi @LucaMarconato , just to doublecheck as a lead-up to the rfc5 replies: a simple statement like this would already do it, right?
If unused, the
inputandoutputfields MAY be null.
I'm slightly confused by this issue since there seem to be a couple of questions:
If the spec says: "[coordinateTransformations] MUST contain the field “output”, unless part of a sequence or inverseOf" then I don't think the spec would allow ouput to be null. If the spec allowed null for any required fields then that's effectively saying those fields are not required.
For attributes that are optional (e.g. output when the transform is part of a sequence), I think it's OK to allow {"foo": null}. I'm not sure how this is semantically different from being unset, and the spec doesn't say anything about it.
If the spec allowed null for any required fields then that's effectively saying those fields are not required.
It's totally possible for a field to be required and take null as a value. The values a field is allowed to take is a separate question from whether the field is allowed to be unset.
For attributes that are optional (e.g. output when the transform is part of a sequence), I think it's OK to allow {"foo": null}.
From my understanding, that was the original idea:
We propose to allow the input and output fields of coordinate transformations to have a value of null, with the same meaning that they are unset.
The spec doesn't mention this elsewhere so I see no harm writing it into the spec. maybe in the future we find a general answer to this (i.e., are null values allowed for optional fields?) but in the meantime this could just go into rfc5. A general statement like
Optional parameters MAY carry the value
NULLin the metadata
(I'm sure there's a better way to phrase this) could then supersede this down the road.
We propose to allow the input and output fields of coordinate transformations to have a value of null, with the same meaning that they are unset.
What does it mean if they are unset?
I was just proposing updates to the wording of the current spec where it states the requirements for input and output at https://github.com/ome/ngff-spec/issues/8
I'm not sure what the purpose of a coordinateTransformation would be if it doesn't specify what it applies to?
What does it mean if they are unset?
I think that applies to the very specific case of coordinateTransformations inside a sequence transformation. Otherwise there's no meaning to an unset input.
With the benefit of hindsight, I think my original request/proposal to allow null in these fields merits wider discussion that should be decided on with respect to the specification as a whole:
Should setting any field to
nullin OME-Zarr metadata have the same meaning as the field not being present?
On a practical level, in ome-zarr-models, adopting this would make our lives easier. But there may well be other points in favour of not adopting this, and I think on a technical level we can work around this if needed.
Will be fixed by https://github.com/ome/ngff/pull/350.
Hi @LucaMarconato , just a quick bump here to ask whether this can be closed now?
I see you added "If unused, the input and output fields MAY be null.", sounds good to me! #350