ome-zarr-py icon indicating copy to clipboard operation
ome-zarr-py copied to clipboard

Plate labels fix

Open will-moore opened this issue 2 years ago • 23 comments

Fixes #65.

This refactors PlateLabels:

  • Doesn't assume that a Well exists at row:0 column:0.
  • Doesn't assume that there is a label named "0" at image/labels/0/. Reads image/labels/.zattrs for every Well to find first name under {"labels": ["first_name"]}.
  • Uses first Well Label Image e.g. A/1/0/labels/name/.zattrs to get dtype and shapes of image pyramid, to generate a dask pyramid for the Plate

Needs a fix in napari-ome-zarr to work: https://github.com/ome/napari-ome-zarr/pull/54

Screenshot 2022-06-20 at 16 42 41

will-moore avatar Jun 20 '22 15:06 will-moore

Needs a fix in napari-ome-zarr to work (PR to follow)...

With the recent community interest (https://github.com/ome/ome-zarr-py/issues/65#issuecomment-1155295470), any thoughts on next steps here, @will-moore?

joshmoore avatar Jun 23 '22 11:06 joshmoore

@joshmoore Yes, I got it working, but tests are failing - See my question above at https://github.com/ome/ome-zarr-py/pull/207#discussion_r902516548

will-moore avatar Jun 23 '22 11:06 will-moore

Codecov Report

Merging #207 (2b71cd7) into master (3cc60f4) will increase coverage by 0.29%. The diff coverage is 90.00%.

@@            Coverage Diff             @@
##           master     #207      +/-   ##
==========================================
+ Coverage   83.85%   84.15%   +0.29%     
==========================================
  Files          12       12              
  Lines        1375     1426      +51     
==========================================
+ Hits         1153     1200      +47     
- Misses        222      226       +4     
Impacted Files Coverage Δ
ome_zarr/reader.py 85.48% <90.00%> (+0.59%) :arrow_up:
ome_zarr/writer.py 96.27% <0.00%> (+0.01%) :arrow_up:
ome_zarr/io.py 84.61% <0.00%> (+0.85%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3cc60f4...2b71cd7. Read the comment docs.

codecov[bot] avatar Jun 23 '22 15:06 codecov[bot]

Unfortunately the path to the Plate and the PlateLabels is the same because the Labels are actually children of each Image, so there is not one path to a child Plate-Label. The reason for this is so that you can view an individual Image and it's labels will be below it's /labels as expected for an Image.

It's certainly expected that the labels for all Images in the plate will be the same size. I think the reading will fail if they are not. I guess it could be possible to handle that, but it seems like it might be a fair bit of work for an edge-case. If some wells are missing labels I think it should be fine, but if the first Well is missing labels, then I don't think they'll be shown.

will-moore avatar Jun 23 '22 23:06 will-moore

Unfortunately the path to the Plate and the PlateLabels is the same because the Labels are actually children of each Image, so there is not one path to a child Plate-Label. The reason for this is so that you can view an individual Image and it's labels will be below it's /labels as expected for an Image.

Would pointing a fictive node e.g. plate.zarr/labels to avoid the duplication be an option?

It's certainly expected that the labels for all Images in the plate will be the same size. I think the reading will fail if they are not. I guess it could be possible to handle that, but it seems like it might be a fair bit of work for an edge-case. If some wells are missing labels I think it should be fine, but if the first Well is missing labels, then I don't think they'll be shown.

In many ways, what this API attempts to achieve raises the question of whether there should be a lightweigth specification at the top-level allowing to register the labelsin addition to registering them at the individual image label? This would also help clarifying the assumptions about the existence/consistency of labels across wells.

sbesson avatar Jun 27 '22 10:06 sbesson

So, to determine whether the node at plate.zarr/labels matches the PlateLabels spec (does the Plate have Labels), you'd need to load the plate.zarr/.zattrs, to access the plate["wells"], check e.g. the first Well or multiple Wells, and check whether the first Image has plate/.zarr/col/row/0/labels/.zattrs.

I could try to look at that... I'm not sure whether it will fix the download issue...

will-moore avatar Jun 27 '22 17:06 will-moore

The last commit tries to support a "fictive" node at /plate.zarr/labels for the PlateLabels spec. But labels are not showing up in napari. Logs show this from napari-ome-zarr, meaning that node.data is None or has len() = 0.

10:17:42 DEBUG skipping non-data /Users/wmoore/Desktop/ZARR/data/52555.zarr/labels (hidden)

will-moore avatar Jun 29 '22 09:06 will-moore

@sbesson That adds support for plate.zarr/labels. Unfortunately you can't do:

$ napari --plugin napari-ome-zarr 52555.zarr/labels
...
ValueError: Requested path '/Users/wmoore/Desktop/ZARR/data/52555.zarr/labels' does not exist.

will-moore avatar Jun 29 '22 11:06 will-moore

Tested with the sample 2551.zarr IDR plate uploaded to https://idr.github.io/ome-ngff-samples/.

Without https://github.com/ome/napari-ome-zarr/pull/54, napari fails with

(napari) sbesson@Sebastiens-MacBook-Pro ~ % napari https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/   
Traceback (most recent call last):
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/components/viewer_model.py", line 1234, in _add_layer_from_data
    layer = add_method(data, **(meta or {}))
TypeError: ViewerModel.add_image() got an unexpected keyword argument 'color'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/components/viewer_model.py", line 1044, in _open_or_raise_error
    added = self._add_layers_with_plugins(
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/components/viewer_model.py", line 1160, in _add_layers_with_plugins
    added.extend(self._add_layer_from_data(*_data))
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/components/viewer_model.py", line 1239, in _add_layer_from_data
    raise TypeError(
TypeError: _add_layer_from_data received an unexpected keyword argument ('color') for layer type image

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/_qt/qt_viewer.py", line 754, in _qt_open
    self.viewer.open(
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/components/viewer_model.py", line 941, in open
    layers = self._open_or_raise_error(
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/components/viewer_model.py", line 1053, in _open_or_raise_error
    raise ReaderPluginError(
napari.errors.reader_errors.ReaderPluginError: Tried opening with napari-ome-zarr, but failed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/sbesson/miniconda3/envs/napari/bin/napari", line 10, in <module>
    sys.exit(main())
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/__main__.py", line 446, in main
    _run()
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/__main__.py", line 311, in _run
    viewer._window._qt_viewer._qt_open(
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/_qt/qt_viewer.py", line 762, in _qt_open
    handle_gui_reading(
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/_qt/dialogs/qt_reader_dialog.py", line 145, in handle_gui_reading
    readers = prepare_remaining_readers(paths, plugin_name, error)
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/napari/_qt/dialogs/qt_reader_dialog.py", line 200, in prepare_remaining_readers
    raise ReaderPluginError(
napari.errors.reader_errors.ReaderPluginError: Tried to read https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/ with plugin napari-ome-zarr, because it was associated with that file extension/because it is the only plugin capable of reading that path, but it gave an error. Try associating a different plugin or installing a different plugin for this kind of file.

This will probably mean some 0.x.0 release and introducing a versions dependency in napari-ome-zarr.

With both PRs installed, napari https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/ now displays the plates with the labels layer. This brings up a few usability issues:

  • the labels only display for Z=0, while in https://idr.openmicroscopy.org/webclient/img_detail/1229801/, they span the entire Z-range. This is not an issue related to this reading code but rather at the data generation level. Off-hand, how easy would it be to fix so that we could regenerate this dataset with labels of the expected dimensionality?
  • also outside the scope of this PR, is there a UI to navigate between the different fields of views/acquisitions at the plate level? At the moment, only the first field of view is displayed

Probably the most important feedback is that the label loading code introduces a significant performance overhead. The loading of the remote plate above increases from 13s to 50s before displaying the data. This performance issue is not specific to the napari workflow and also affects ome_zarr info as the label metadata is read for every field of view.

Tested briefly the ome_zarr download workflow which had been reported to be broken with the node duplication strategy (https://forum.image.sc/t/hcs-ome-zarr-data-unable-to-download/57114/9). ome_zarr download https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/ now fails with an issue similar to https://github.com/ome/ome-zarr-py/pull/207#issuecomment-1169871967 i.e.

downloading...
   2551.zarr
   2551.zarr/labels
to .
Traceback (most recent call last):
  File "/Users/sbesson/miniconda3/envs/napari/bin/ome_zarr", line 8, in <module>
    sys.exit(main())
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/ome_zarr/cli.py", line 166, in main
    ns.func(ns)
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/ome_zarr/cli.py", line 35, in download
    zarr_download(args.path, args.output)
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/ome_zarr/utils.py", line 101, in download
    zarr.group(str(target_path))
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/zarr/hierarchy.py", line 1263, in group
    return Group(store, read_only=False, chunk_store=chunk_store,
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/zarr/hierarchy.py", line 164, in __init__
    self._meta = self._store._metadata_class.decode_group_metadata(meta_bytes)
  File "/Users/sbesson/miniconda3/envs/napari/lib/python3.10/site-packages/zarr/meta.py", line 203, in decode_group_metadata
    raise MetadataError("unsupported zarr format: %s" % zarr_format)
zarr.errors.MetadataError: unsupported zarr format: None

From my side this raises two high-level questions:

  1. data representation: as shown above, attaching the PlateLabels spec either to the plate node or to a fictive node has unexpected consequences on downstream API. Conceptually, the introduction of an organic Spec implementation unspecified in https://ngff.openmicroscopy.org/ confuses me and the lack of specification is likely related to all the issues above. Do we foresee a driver for PlateLabels outside napari? If so, would it help to keep the nodes/spec classes in ome_zarr as a strict representation the OME-NGFF specification and instead try to migrate this creation of the composite layer logic at the level of napari-ome-zarr ?
  2. performance : independently of the decision on the architecture above, loading labels metadata across N nodes has a computational cost especially when working with remote datasets. Possibly, some profiling might point out towards some easy improvements in the current code. Moving forward, could we foresee making additional usage of dask for these use cases to schedule and distribute the metadata retrieval?

sbesson avatar Jul 12 '22 14:07 sbesson

For the images at e.g. https://idr.openmicroscopy.org/webclient/img_detail/1229801/ the masks have no Z-index, and they are currently exported as a 2D labels (e.g. see https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/A/1/0/labels/0/). I imagine this issue could be solved in a couple of ways:

  • We could duplicate the 2D labels for every Z-index to give us a 3D labels. This is a bit of data duplication, not so much for 1 image but increases for a Plate, and could be larger for bigger images.
  • Alternatively I wonder if we could use a coordinateTransform to specify that the single 2D mask should be transformed to span the entire Z-stack? Currently this wouldn't work for a 2D mask, since the coordinateTransform could only be applied to the 2 X/Y axes. But if we had a 3D labels data with Z-axis of size=1, then it might work (at least for a single image - could get trickier for a whole plate)

I don't know of a way for a napari reader plugin to add anything to the napari UI: https://napari.org/stable/plugins/contributions.html#contributions-readers but I guess napari-ome-zarr could also include a UI widget plugin that could perhaps access metadata associated with the Plate layer and request that data from a different Field is loaded as new layers (maybe replacing the original layers)?

Migrating all of the PlateLabels logic to napari-ome-zarr seems like a nice idea (to avoid the node-traversal issues we're seeing) but that class is a subclass of the Plate spec, so it would be a fair bit of work and would mean that napari-ome-zarr really has to know a lot more about the OME-NGFF spec, instead of just passing data and metadata to napari.

Performance: the only way I can see to avoid making so many requests is to make some assumptions: e.g. assume that the labels are all named the same for each image. This is similar to previous assumptions (in vizarr) that the path to each Image in all Wells would be the same (https://github.com/hms-dbmi/vizarr/issues/118), which had to be reverted. We do assume that all labels are the same shape and dtype.

will-moore avatar Jul 22 '22 09:07 will-moore

We could duplicate the 2D labels for every Z-index to give us a 3D labels. This is a bit of data duplication, not so much for 1 image but increases for a Plate, and could be larger for bigger images. Alternatively I wonder if we could use a coordinateTransform to specify that the single 2D mask should be transformed to span the entire Z-stack? Currently this wouldn't work for a 2D mask, since the coordinateTransform could only be applied to the 2 X/Y axes. But if we had a 3D labels data with Z-axis of size=1, then it might work (at least for a single image - could get trickier for a whole plate)

Agreed on the size increase but as per https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#Shape_TheZ, 2D labels is an incorrect representation of the original data when TheZ is not specified. At least for our representative samples, my inclination would be to go for the first option as 3D label images are clearly defined and supported. To optimize this type of conversion moving forward, two thoughts:

  • making more use of transformations as you mention. Clients will need to be updated to support such transformation
  • as the data is replicated alongside the Z axis, I wonder if a combination of using 3D chunks + compression would result in smaller sizes.

Migrating all of the PlateLabels logic to napari-ome-zarr seems like a nice idea (to avoid the node-traversal issues we're seeing) but that class is a subclass of the Plate spec, so it would be a fair bit of work and would mean that napari-ome-zarr really has to know a lot more about the OME-NGFF spec, instead of just passing data and metadata to napari.

The current delegation mechanism is attractive in terms of simplicity and allows to keep the napari-ome-zarr fairly shallow. My major issue is the ambiguity on the definition of the Spec API. I can easily see implementations growing out of bounds. For instance, let's assume a consumer would likeo overlay labels on top of the fields of views when viewing a well in napari. As per the current design, this would require the introduction of new WellsLabels(Spec) class and so on for any additional layer.

Assuming we want to separate the data/metadata retrieval logic at the level of this library and forward it to napari-ome-zarr, could we see definining a new separate top-level concept e.g. node overlay/layer allowing to register these additional representations which are effectively composites? At the consumer level, https://github.com/ome/napari-ome-zarr/pull/54/files#diff-3ac23664593f6255d26806df61d2364f3d3f0b66c53a4fad561a70877093239aR126 would generically need to test the existing of such elements and display them using their data/metadata.

Performance: the only way I can see to avoid making so many requests is to make some assumptions: e.g. assume that the labels are all named the same for each image. This is similar to previous assumptions (in vizarr) that the path to each Image in all Wells would be the same (https://github.com/hms-dbmi/vizarr/issues/118), which had to be reverted. We do assume that all labels are the same shape and dtype.

So your feeling is that in addition to the retrieval of additional data per label, the primary overhead arises from the handling of label names for each field of view? What would be the current behavior if e.g. all fields of view had a nuclei label image and a handful also had a cells label image? At least from my side, I do not see a problem with making assumptions on the homogeneity of the labels across wells in this type of composite representation.

sbesson avatar Jul 28 '22 15:07 sbesson

Maybe if when viewing a Plate with Labels (e.g in napari) then we should actually store a Plate of labels below the Plate itself at /path/to/plate.zarr/labels/ in the same way as there is for Image? That would make it clear that "this plate has labels" rather than "some of the images in this plate have labels" and make it easy to simply use the Plate spec for those labels.

The only thing that would break if we store the labels in this way is when you view an Image, you don't automatically see the Labels - For that we'd need the plate/0/1/0/labels/.zattrs labels to be relative paths, up to the plate then down to the Well like {"labels": ["../../../../labels/plate/0/1/0/"] }.

will-moore avatar Aug 01 '22 09:08 will-moore

Hmmm.... so what's the next step here, @will-moore ?

joshmoore avatar Aug 03 '22 19:08 joshmoore

@sbesson Now we're both back, do you want to discuss this PR sometime?

will-moore avatar Aug 15 '22 14:08 will-moore

Very cool to see that this discussion is ongoing, looking forward to seeing labels on full plates!

Alternatively I wonder if we could use a coordinateTransform to specify that the single 2D mask should be transformed to span the entire Z-stack? Currently this wouldn't work for a 2D mask, since the coordinateTransform could only be applied to the 2 X/Y axes. But if we had a 3D labels data with Z-axis of size=1, then it might work (at least for a single image - could get trickier for a whole plate)

I was wondering about this topic: Is there already a discussion ongoing somewhere about mixed 2D/3D datasets? Having 2D labels is one use case. Another thing we often hit is doing maximum intensity projections along Z to get better overviews of large 3D datasets. So far, we've always saved them as separate OME-Zarr plate files. I've been meaning to create a image.sc forum post or open an issue on this topic, as it would be great if there were a way to have them in the same dataset and to have a way to specify that one is a 2D-only dataset, e.g. the 2D labels should belong to all planes. While labels could be saved as 3D with data duplication, a similar thing wouldn't make sense for maximum intensity projections. Is this already being discussed somewhere? If not, but there's interest in leading the discussion, I'll be happy to start it (on github? image.sc?).

Probably the most important feedback is that the label loading code introduces a significant performance overhead. The loading of the remote plate above increases from 13s to 50s before displaying the data. This performance issue is not specific to the napari workflow and also affects ome_zarr info as the label metadata is read for every field of view.

While there is the interesting debate about what assumptions can be made and what metadata needs to be read per field, I had 2 additional questions on this topic:

  1. In our experience (see #200), having many small field of views had many performance issues and we thus switched to having a single field of view per well. Would this limit the performance impact of label loading?
  2. This question may be due to my limited understanding of Zarrs. But if a major bottleneck is reading many small metadata files whenever we want to visualize, could that be solved with some top-level metadata? If there is some metadata on the plate level about available labels, those would be easy to read. That would move some of the burden to the OME-Zarr writing part, which should happen ~once, while visualization access can happen much more. (though it may mean having to run some wrap-up operation once all the component of the zarr file have been written, as this metadata couldn't be written in parallel while e.g. different wells are written separately. And maybe metadata is still required on the per site level, if a user wants to load a single site)

jluethi avatar Aug 26 '22 06:08 jluethi

@jluethi For 2D -> 3D transforms (e.g. 2D masks/projections and 3D data) there is a collection of "user stories" at https://github.com/ome/ngff/issues/84 - Story 9 is probably most relevant to the discussion here. Those stories are part of an upcoming NGFF community call: https://forum.image.sc/t/ome-ngff-community-call-transforms-and-tables/71792

will-moore avatar Sep 20 '22 13:09 will-moore

@will-moore Thanks for linking me to the user stories. I will be joining the community calls. Was mostly drawn by the tables discussion, but now I'm also looking forward to learn more about how the transformations can be used for this 2D -> 3D mapping. Do you think such transformations could then also be used on the 2D labels instead of duplicating them along 3D?

Also, do you want some testing & feedback on this PR to display the labels? Happy to test on some different OME-Zarrs we have to check how things perform :) Don't have any large plates with labels ready just yet, but can hopefully generate them soon to test.

jluethi avatar Sep 20 '22 13:09 jluethi

@jluethi with this PR and https://github.com/ome/napari-ome-zarr/pull/54 installed, you can try:

$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr -vvv

But remember you have to move to Z-index: 0 before you see any labels:

Screenshot 2022-09-22 at 13 53 37

will-moore avatar Sep 22 '22 12:09 will-moore

@jluethi As you will see with that test command, there are a lot of requests to get the labels path for each Well (instead of assuming it is the same for every Well). It is already quite slow on a 96-well plate. I think that this is probably not scalable to a 384-well plate, particularly if you want to include multiple acquisitions!

So maybe it needs a re-think?

will-moore avatar Sep 27 '22 12:09 will-moore

Hmm, I'm just preparing some example datasets with labels for a whole plate. In a first test for a plate that had labels for 6 wells (and each well consisting of 72 images fused into a single array per well) is actually very cool to browse! The initial loading is fairly slow (~ 40s, compared to loading of 25s for the normal OME-Zarr version that doesn't load labels). But browsing is very smooth afterwards for me (using the new napari async branch 4905: https://github.com/napari/napari/pull/4905)!

I'll retest again once I've managed to actually process a whole plate with labels (labels on 23 wells instead of 6). But thought I'd share this impression at least :)

Video of plate with partial labels and this PR: https://www.dropbox.com/s/2ds0nxhe9btqaxk/napari_dev.mov?dl=0

Video of loading the same plate in standard napari 0.4.16, released napari-ome-zarr: https://www.dropbox.com/s/2ejnseb19dm8i70/standard_ome_Zarr_py.mov?dl=0

Maybe having the label layer as default visible=False for loading of the whole plate would be a good idea to speed up initial loading? That's also how it's handled for individual images and at a plate level, labels aren't that informative. They really become interesting when one starts zooming in :)

Also, this was an MIP plate, thus everything was 2D only and the browsing of labels worked well for that :)

Thanks in any case for the installation instructions @will-moore , I'm very impressed with being able to browse plates with labels in that way in napari! ❤️

jluethi avatar Sep 28 '22 11:09 jluethi

I have now created a larger plate with complete labels and tested it there. For a plate with 23 wells à 19440 x 20480 pixels (combined 8x9 field of views into a single image per well), 3 channels and 1 label channel (cellpose nuclear segmentation), it loads very well! That's about 100 GBs of image data loading remotely (via VPN & a quite slow samba share that's mounted locally) Yes, the initial load is slightly slower (difference isn't too big in napari 0.4.16, a bit bigger in the napari async branch). But afterwards, it's loading very responsively and browsing the plate with the labels is great (better in the new async scenario because that's smoother, but both work very well!).

I don't have any 384 well examples or test cases with many wells at the moment. So maybe number of label images loading slows things down a bit. But this dataset that was originally 1656 field of views can be stored in OME-Zarrs in a manner that allows for very responsive loading! :)

Here are the example videos: 1: napari 0.4.16 & released ome-zarr-py 0.6.0 (no label loading): https://www.dropbox.com/s/wrz5tsy1tmteypr/20220929_LabelPlate_napari0416_omeZarr06.mov?dl=0 2: napari 0.4.16 & the PRs of ome-zarr-py & napari-ome-zarr: https://www.dropbox.com/s/rcbh9aotmlugcr5/20220929_LabelPlate_napari0416_omeZarrPR.mov?dl=0 3. napari async PR & the PRs of ome-zarr-py & napari-ome-zarr: https://www.dropbox.com/s/sasxqw60q7r7440/20220929_napari_async_omeZarr_PR.mov?dl=0

jluethi avatar Sep 29 '22 08:09 jluethi