ome-zarr-py icon indicating copy to clipboard operation
ome-zarr-py copied to clipboard

bioformats2raw metadata support

Open joshmoore opened this issue 3 years ago • 17 comments

  • [x] Add Implicit spec to loop over metadata-less "collections"
  • [x] Add Leaf & Root specs
  • [x] Support entrypoint-based specs ("ome_zarr.spec")
  • [x] Use entrypoint to adder suport for https://github.com/ome/ngff/pull/112
  • [ ] add tests for SHOULD/MAY portions of the spec

Part of the investigation of metadata in https://github.com/ome/ngff/issues/104. This "implicit" group is the cheapest form of collection imaginable.

Currently, only groups within the given group (and not arrays or explicit files) will be further parsed.

joshmoore avatar Mar 02 '22 19:03 joshmoore

Codecov Report

Patch coverage: 77.41% and project coverage change: -0.89 :warning:

Comparison is base (8964374) 84.79% compared to head (28155d5) 83.90%.

:exclamation: Current head 28155d5 differs from pull request most recent head 836dfd2. Consider uploading reports for the commit 836dfd2 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #174      +/-   ##
==========================================
- Coverage   84.79%   83.90%   -0.89%     
==========================================
  Files          13       14       +1     
  Lines        1473     1591     +118     
==========================================
+ Hits         1249     1335      +86     
- Misses        224      256      +32     
Impacted Files Coverage Δ
ome_zarr/reader.py 83.52% <65.38%> (-3.18%) :arrow_down:
ome_zarr/bioformats2raw.py 86.11% <86.11%> (ø)

... and 9 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Mar 02 '22 19:03 codecov[bot]

See https://github.com/ome/ome-zarr-metadata/releases/tag/0.1.0 for an example of an entrypoint. After creating a fake .zgroup under the output of bioformats2raw a.fake /tmp/a.ome.zarr

$ ome_zarr info /tmp/a.ome.zarr/0/test/
/private/tmp/a.ome.zarr/0/test [zgroup]
 - metadata
   - Implicit (1)
   - Leaf (2)
 - data
/private/tmp/a.ome.zarr/0 [zgroup]
 - metadata
   - Multiscales
   - Leaf (2)
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)
/private/tmp/a.ome.zarr [zgroup]
 - metadata
   - bioformats2raw (3)
   - Root (2)
 - data

Notice:

  1. the Implicit spec scans groups that have no other metadata
  2. Leaf/Root work their way up and back down a hierarchy
  3. bioformats2raw reads OME/METADATA.ome.xml

joshmoore avatar Mar 03 '22 21:03 joshmoore

For an Image in a Plate (12 Wells A-C, 1-4, all wells with labels), without this PR I get:

$ ome_zarr info 251.zarr/A/1/0/
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0 [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (3, 1024, 1344)
   - (3, 512, 672)
   - (3, 256, 336)
   - (3, 128, 168)
   - (3, 64, 84)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels [zgroup] (hidden)
 - metadata
   - Labels
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
 - data
   - (1, 1024, 1344)
   - (1, 512, 672)
   - (1, 256, 336)
   - (1, 128, 168)
   - (1, 64, 84)
   - (1, 32, 42)

and with this PR I get all the sibling A Wells A2, A3, A4, but not B1-B4 or C1-C4. And I don't get labels for those Wells.

$ ome_zarr info 251.zarr/A/1/0/
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0 [zgroup]
 - metadata
   - Multiscales
   - OMERO
   - Leaf
 - data
   - (3, 1024, 1344)
   - (3, 512, 672)
   - (3, 256, 336)
   - (3, 128, 168)
   - (3, 64, 84)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels [zgroup] (hidden)
 - metadata
   - Labels
   - Leaf
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1/0/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
   - Leaf
 - data
   - (1, 1024, 1344)
   - (1, 512, 672)
   - (1, 256, 336)
   - (1, 128, 168)
   - (1, 64, 84)
   - (1, 32, 42)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/1 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A [zgroup]
 - metadata
   - Implicit
   - Leaf
 - data
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/2 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/3 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr/A/4 [zgroup]
 - metadata
   - Well
   - Leaf
 - data
   - (3, 1024, 1344)
/Users/wmoore/Desktop/ZARR/data/v4_omero/plates/251.zarr [zgroup]
 - metadata
   - Plate
   - Root
 - data
   - (3, 768, 1344)

Without this PR, napari 251.zarr/A/1/0/ gives me just the 1 image + labels:

Screenshot 2022-03-23 at 10 47 04

With this PR, I get everything as for 'info' above: All the A wells, but only labels for A1: Screenshot 2022-03-23 at 10 46 07

will-moore avatar Mar 23 '22 10:03 will-moore

and with this PR I get all the sibling A Wells A2, A3, A4, but not B1-B4 or C1-C4. And I don't get labels for those Wells.

@will-moore, you approve of getting the siblings (assuming napari can be fixed)? If so, I'll look why it's only for the one row.

Also, where do you get the labels for this plate? Is this the one you had a script for?

joshmoore avatar Apr 13 '22 10:04 joshmoore

@will-moore, I've reverted the upwards parsing. It seemed like a good strategy but there are currently too many edge cases. I don't have labels on plates for testing at the moment, but I think with the current state along with https://github.com/ome/ome-zarr-metadata/commit/08e12f784e085adbf4ca6d384720443108f96cb6#diff-0bb17e0ecb4ac83835ee3800a1af71a12f644b0ce782c623ba97f8917916250eR54 all the following should be true:

non-bf2raw bf2raw
HCS unchanged unchanged
non-HCS unchanged now loads all images

The only other change I can think of is if you pass a group that previously did nothing, it will likely try to load the contents.

joshmoore avatar Apr 13 '22 13:04 joshmoore

In discussing today with @dgault, @sbesson, @jburel and @melissalinkert, there was a case made for at least adding the flag (Leaf) to make it possible for clients to detect that there is more information that needs loading. Additional methods or parameters should then allow that loading.

joshmoore avatar Apr 18 '22 13:04 joshmoore

To improve the codecov results, see https://github.com/zarr-developers/numcodecs/pull/300/files#diff-bc37cd9860eec1facdc18a47798e8a1a2c0ef5dabd999deee049de4a48a5d35fR1 for an option of in-repo testing of entrypoints.

joshmoore avatar Apr 18 '22 15:04 joshmoore

@joshmoore To help address the "don't have labels on plates for testing", I created https://gist.github.com/will-moore/0f4cb6b1fdd60a255fcbb956a54a645e which adds labels to a plate (currently assumes images axes are cyx) by segmenting one of the channels.

I don't know if I'm missing something, maybe not using ome_zarr properly, but it feels quite manual to e.g. iterate through Wells on a Plate - manually parsing JSON, joining paths etc and parse_url() for every Well and every Image.

will-moore avatar Apr 27 '22 15:04 will-moore

see a quick use of this functionality:

  • https://github.com/ome/ome-zarr-metadata/pull/1
  • https://github.com/ome/napari-ome-zarr/pull/47

joshmoore avatar May 04 '22 09:05 joshmoore

Migrated the bf2raw implementation from https://github.com/ome/ome-zarr-metadata :

$ bioformats2raw-0.5.0-SNAPSHOT/bin/bioformats2raw 'my&series=2.fake' test_output
$ ome_zarr info test_output/
/opt/ome-zarr-py/test_output [zgroup]
 - metadata
   - bioformats2raw
 - data
/opt/ome-zarr-py/test_output/0 [zgroup]
 - metadata
   - Multiscales
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)
/opt/ome-zarr-py/test_output/1 [zgroup]
 - metadata
   - Multiscales
 - data
   - (1, 1, 1, 512, 512)
   - (1, 1, 1, 256, 256)

joshmoore avatar Sep 15 '22 10:09 joshmoore

Should we also discuss the name of the module itself?

joshmoore avatar Sep 22 '22 06:09 joshmoore

We added an omero block of channel & rendering metadata to the multiscale .zattrs (because it came from omero) but we actually want other tools to read and write this metadata, which may be discouraged by the naming. In the same way, bioformats2raw.layout is a spec that just happens to be produced originally by bioformats2raw, but it's really a spec that ALL tools should read/write. I don't know if it's too late to think about a different name there, or if the name has already stuck?

will-moore avatar Sep 22 '22 09:09 will-moore

Other than the string bioformats2raw.layout we're pretty free to change things here. (I'd say we definitely don't want to reproduce what we did with omero and we actually need to think about how to make that "transitional" as well)

joshmoore avatar Sep 22 '22 10:09 joshmoore

Ah - yes, too late to change the "bioformats2raw.layout" key because data generated with this already exists.

will-moore avatar Sep 22 '22 10:09 will-moore

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/intermission-ome-ngff-0-4-1-bioformats2raw-0-5-0-et-al/72214/1

imagesc-bot avatar Sep 28 '22 18:09 imagesc-bot

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/saving-volumetric-data-with-voxel-size-colormap-annotations/85537/24

imagesc-bot avatar Aug 30 '23 10:08 imagesc-bot