No way to distinguish image and labels-image groups in 0.5
In version 0.4, it was possible to tell whether a group was a image group or an image labels group, because an image labels group was required to have a image-label entry in the metadata (https://ngff.openmicroscopy.org/0.4/index.html#label-md).
In version 0.5, it is no longer possible to distinguish between these two types of group, because image labels are no longer required to have an image-label entry. Instead, the spec only says this entry should (not must) be present. (https://ngff.openmicroscopy.org/0.5/index.html#labels-md).
This means that every valid image group is also a valid image-label group, and vice versa.
Would it be useful to still distinguish between these two types of group, and if so, introduce some new metadata that allows them to be distinguished?
Personally I think there is value in distinguishing between these groups so they can be interpreted differently by image veiwers (a labels dataset should be displayed differently to an imaging dataset for example) and readers, and for scientists to signal their intent as to whether a group is an image or labels.
The 0.5 specification says:
The "labels" group is nested within an image group, at the same level of the Zarr hierarchy as the resolution levels for the original image. The OME-Zarr Metadata in the zarr.json file associated with the "labels" group MUST contain a JSON object with the key labels, whose value is a JSON array of paths to the labeled multiscale image(s).
Without the presence of the labels group under the image's metadata, I would say it's safe to assume that you are looking at an image and not a label?
What if I had a single labels image, that was not associated with an underlying 'normal' image, and therefore not inside a labels group?
And in general I don't think it's possible to navigate "down" a Zarr hierarchy, at least in zarr-python. So if (say) ome-zarr-models is given as input a group that contains a labels image, it has no way to inspect the group below to see if it's name is "labels".
What would be the downside of changing SHOULD to MUST in:
Metadata in this image-level zarr.json file SHOULD contain another key, image-label
?
If a image doesn't have the image-level key it can simply interpreted as an image anyway, but if it does have the key it's interpreted as an image-label group.
coming here because i playing around with refactoring some stuff in ome-models-zarr-py, found something that seemed surprising, found https://github.com/ome-zarr-models/ome-zarr-models-py/pull/286 which led me here. my $0.02:
tldr: I do ultimately agree with @dstansby that SHOULD should be changed back to MUST. (But I also think that ome-models-zarr-py should just make image_label a mandatory field in the ImageLabelAttrs class ... i.e. undo this change from that PR), even if it "technically" doesn't follow the literal words of the spec). ... but there is still plenty left to be desired about the clarity of labels.
-
What @jo-mueller says is true: there is "enough" information here to disambiguate: "Without the presence of the labels group under the image's metadata, I would say it's safe to assume that you are looking at an image and not a label"...
However, that's a rather indirect inference, which is not a great spec design. and also: what exactly is meant by "the presence of the labels group under the image's metadata"? do you mean:
- a zarr group/directory named "labels", at
my_image.zarr/labelsthat contains{"attributes": { "ome": { "labels":[]}}}insidemy_image.zarr /labels/zarr.json) - or, by saying "under the image's metadata" ... are you referring to some characteristic of the json document itself? (unrelated to the zarr hierarchy).
- a zarr group/directory named "labels", at
-
@dstansby says: "This means that every valid image group is also a valid image-label group, and vice versa." Which is sort of true... but i think a strict interpretation, put into words, could still be "if you don't see "image-labels" there, it's just an
Image. This would, of course, be a rather strange interpretation if that json object happened to be found inside of a "labels" group, e.g. atmy_image/labels/cell_segmentation/zarr.json. But technically speaking, that is how an application would be obligated to interpret it if it happened to find such a zarr group that lacked the"image-label"but did contain"multiscales"(regardless of it's position in the broader hierarchy ... which again: shouldn't have to be traversed to infer semantics). -
@dstansby: if I were you. I would just make
image_labelmandatory onImageLabelAttrsclasses (in other words, don't worry about what the spec says, make the change to your model anyway). While making itimage_label: Label | None = Noneis indeed consistent with the word "SHOULD" in the spec, it doesn't answer the question "but what happens if that key is not there"; to which the answer appears to be "well then it's just a regularImageAttrs". (so:ImageLabelAttrsis a subclass ofImageAttrs.. but it doesn't go the other way). Then your discriminated unions "just work" in pydantic without complex/error-prone downcasting.
in any case: reverting this back to MUST would be an easy non-breaking change (since it was already MUST) that would resolve at least a little ambiguity.
additional thoughts about the discovery of labels
The spec simply says
In OME-Zarr, Zarr arrays representing pixel-annotation data are stored in a group called "labels".
But "are stored in a group" is vague. Is that normative? "MUST" pixel annotations be in a sub-group called "labels"? Or is it just a convention... and must applications look inside of all groups in the top level hierarchy (regardless of their name) to check whether they are OME nodes that contain the key "labels"?
- If the name "labels" is normative... then that would appear to be the only case in the spec where the naming of the folders inside the zarr hierarchy actually matter, right? Is that intentional?
- if the name "labels" is not normative... then that would appear be the only case in the spec where manual directory exploration is required in order to discover subgroups, rather than direct metadata-driven discovery. (
multiscales.datasetshavepath,plate.wellshave paths,well.imageshave paths)
My understanding is...
that would appear to be the only case in the spec where the naming of the folders inside the zarr hierarchy actually matter, right?
- Correct. If you want to store labels for
image.zarr, put them inimage.zarr/labels/cells/and putlabels:["cells"]inimage.zarr/labels/zarr.json - If you put that same
cellsimage underimage.zarr/cells, that wouldn't trigger a "Invalid" status of the data, it just means that viewers etc. wouldn't be expected to find it and display it as labels.
that would appear be the only case in the spec where manual directory exploration is required in order to discover subgroups, rather than direct metadata-driven discovery
- Also correct! If you're a client, reading
image.zarrand you want to know if it has labels, you have to manually check ifimage.zarr/labelsexists. So, this is limited exploration of a sub-directory and is therefore possible with any zarr implementation, includingzarr-pythonetc.
So, both those statements are correct. I would say that "labels" is normative.
I think ome-zarr-models-py can easy state that "path/to/data.zarr" won't be interpreted as a Label unless the data.zarr/zarr.json contains "image-label":{}.
When I read the spec saying "label images SHOULD contain "image-label":{}, I interpret as "You SHOULD add "image-label":{} to your label images, so that they are recognised as labels.
That is kinda the same as saying "If you want to GUARANTEE that your label images are recognised as labels, then they MUST include "image-label":{}.
Hmmm - I'm not sure if I'm making myself clear here?!
That is kinda the same as saying "If you want to GUARANTEE that your label images are recognised as labels, then they MUST include "image-label":{}.
exactly :joy: ... it's sort of a "polite" way to say "if you want these things to be considered as labels, erm, you really should add this key, otherwise we will just treat your awkward binary masks as images". And that's all fine and good, but it would be equally served, and clearer, by saying "to be considered labels, it MUST have this key" (with the implicit subtext being "otherwise it's just an image")
If you want to store labels for image.zarr, put them in image.zarr/labels/cells/ and put labels:["cells"] in image.zarr/labels/zarr.json
I'm curious whether it was also discussed to not have a specially named folder, but rather put "labels" directly next to "multiscales", re-enabling metadata-only discovery?
my_image/
├── zarr.json # Group metadata with "multiscales", and "labels": ["cells"]
├── 0/ # Resolution level 0 (highest resolution)
│ └── zarr.json # Array metadata
├── 1/ # Resolution level 1
│ └── zarr.json
└── cells/ # Optional labels group
└── zarr.json # Group metadata with "multiscales", and "image-label"
it would seem at first glance to achieve the same thing as a single specially named folder with a single metadata key pointing to new folders?