ngff Generalize well organization in high-content screening: field of view => image

I would like to suggest a change to the wording of the OME-NGFF HCS plate specification and add some recommendations about performance for visualization vs. structure of image pyramids per well. Specifically, I propose that we explicitly allow for whole wells being saved as a single image as part of the OME-NGFF spec. As a conclusion of this, the components of the wells would be images, not field of views (because the image could consist of multiple field of views stitched together already).

Motivation

We would like to use OME-Zarr files to store TB-sized multi-channel, 3D high content imaging data in the HCS format. We are building an open-source image processing pipeline to process data in HCS OME-Zarr called Fractal. One of the benefits of saving such large datasets in OME-Zarrs is the possibility of interactive image visualization, e.g. in the napari viewer. When we were testing the scalability of this approach to large HCS plates, we discovered issues with saving all the field of views of the microscope as separate field of views in each well of the OME-Zarr file. We started the discussion about this topic here: https://github.com/ome/ome-zarr-py/issues/200 The discussion on the approach of saving single images per well starts here in more detail: https://github.com/ome/ome-zarr-py/issues/200#issuecomment-1167251097

To very briefly summarize it: By saving many field of views (FOVs) per well as separate images with the whole pyramid hierarchy leads to very suboptimal IO challenges. To visualize plates at low resolution, a tiny pyramid file needs to be loaded for each field of view. When a plate has >1000 field of views across all its wells, this becomes very, very slow. Even for a case with just 72 field of views and just 3 pyramid levels, loading was already 8 times slower with the FOVs saved as separate image pyramids vs. a single image pyramid. This seems to be quite a fundamental issue of how fast many small files vs. a single large file can be accessed and would likely get worse when using object storage vs classical file systems. See further details in the issues above OME_Zarr_Pyramids

Thus, our solution to this has been to store our wells as a single, fused images for each well. In discussions on this issue, there was an openness to this approach being part of the spec. Thus, I have created this PR to suggest a change that would explicitly allow this and mentions the trade-offs. I hope this PR can be the place to discuss this further and see whether it can make it into the ome-ngff spec.

Open questions

How should we specify the trade-offs? I'm proposing a "Note" here, but open to other implementations. Also, is this specification of Note correct? Does it work for multi-line paragraphs?

Is the explanation of the trade-offs understandable? See here: 20261ace44a4be387f02225fbef93ef8281b1aa5

Note: Trade-offs on how data is structured per well: Field of views of the microscope MAY be saved as individual images in each well to allow for maximal flexibility regarding translations between field of views. Having wells with many individual images does not scale for visualisation of large plates. Visualisation tools would then need to read all the tiny pyramid files for each field of view to create overviews and this IO performance becomes a big limiting factor. In that case, all the field of views SHOULD be saved as a single, combined image. In that way, the pyramid chunks can be kept at a reasonable size for low-resolution representations of a well.

I think it is important to get away from the field of view naming in the spec when wells can be collections of images. But there are two keys in the plate metadata that contain the name field. How should one proceed with these? Specifically, maximumfieldcount (does it describe max field of views per well? Or in total? ⇒ is the wording of images per well correct? Or would it be images in the whole plate (though then what is “max”, isn’t that just a count)?) and field_count (is that per well or per plate? It says “fields per view” ⇒ what is a view?)

Aug 29 '22 13:08 jluethi

Automated Review URLs

Aug 29 '22 13:08 github-actions[bot]

Thanks for that. I feel that MAY and SHOULD terms are about the rules of the Spec itself and probably shouldn't be used in this context? I think you can drop 1 or 2 sentences and be a bit less explicit, and users will still understand. How about this:

"Field of views of the microscope may be saved as individual images in each well to allow for maximal flexibility regarding translations between field of views. However, having wells with many individual images does not scale well for visualisation of large plates. In that case, combining the fields and saving as a single image per Well is likely to improve performance."

Aug 29 '22 13:08 will-moore

maximumfieldcount is the largest number of fields in any single Well. Please feel free to modify the description of this term to clarify this in the spec. Comes from OME model: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome.html

Aug 29 '22 13:08 will-moore

Thanks @will-moore

How about this

Sounds great, I shortened it that way

modify the description

Thanks for the confirmation. In that case, I guess it needs to remain being called maximumfieldcount & my wording change should be correct. I slightly updated the field_count to be (hoepfully) more clear as well

Aug 29 '22 17:08 jluethi

@will-moore Just checking in: What is the process or timeline to get this change into the OME-NGFF spec? Is there a chance it will be part of the 0.5 spec? Do I need to talk to some people or convince someone else first that this would be a good idea? No stress at all, just wanted to check in whether I should be doing something about this PR :)

Oct 26 '22 17:10 jluethi

I would expect this to be included in v0.5 spec, especially since it's more like advice than a change in spec. Anything else needed here @sbesson?

Nov 03 '22 10:11 will-moore

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/faim-hcs-functions-to-work-with-hcs-data/78868/11

Mar 27 '23 08:03 imagesc-bot

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/using-naparis-new-not-yet-released-async-functionality-to-browse-large-ome-zarr-hcs-plates/86984/1

Oct 03 '23 19:10 imagesc-bot

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/best-approach-for-appending-to-ome-ngff-datasets/89070/3

Nov 23 '23 12:11 imagesc-bot

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/fractal-framework-zarr-compatibility/92536/2

Feb 22 '24 15:02 imagesc-bot

ngff ngff copied to clipboard

Generalize well organization in high-content screening: field of view => image

Motivation

Open questions

Automated Review URLs

ngff
ngff copied to clipboard