ome-zarr-py icon indicating copy to clipboard operation
ome-zarr-py copied to clipboard

Handling “acquisitions” in plate & well reading

Open jluethi opened this issue 1 year ago • 14 comments

We are looking to save multiplexing data (⇒ multiple cycles of acquisitions for the same plate) to HCS OME-Zarr files. Reading in the OME-NGFF spec, plates have the acquisitions key. This appears like a good way to save multiple acquisitions to the same plate.

I tested this with a small test dataset containing 2 acquisitions for a tiny plate (single well, single large image per well & acquisition): https://www.dropbox.com/s/tj8jj5iu5yxinz9/2_acquisitions.ome.zarr.zip?dl=0

This example has 2 images per well (/FOVs, see https://github.com/ome/ngff/pull/137), 1 per acquisition (2 times the exact same images for both acquisitions for testing purposes). When loading the images individually into napari with the napari-ome-zarr plugin, I get the desired behavior of all 3 channels of both acquisitions being loaded ⇒ 6 channels & 2 label channels. ImageLoading

When loading the whole well or the plate, I get behaviors that don’t match this. Loading as a well, I get the two acquisitions as the same channel, just tiled next to each other (probably a limitation of our well loading PR that just tiles all the images and doesn’t check for acquisitions). Screenshot 2022-09-08 at 11 28 23

When I load the plate, I just get the first acquisition, the second acquisition is ignored. Screenshot 2022-09-08 at 11 28 33

Now my big question: Is the current spec version of having multiple acquisitions what should be implemented to the ome-zarr-py? Is loading multiple acquisition as in the first image a behavior that ome-zarr-py would want to support for wells & plates? Or has the thinking changed in how to handle multiple acquisitions? I saw that there originally was acquisition loading support for plates, but it was removed in https://github.com/ome/ome-zarr-py/pull/111

No, acquisitions was just something we tried out and decided wasn't what we wanted in the spec. I only left it in temporarily so we could view a couple of the plates that I'd created in the meantime.

@will-moore : Was this just for the original implementation of acquisitions (as an additional level), or also for the new spec?

If the current acquisition logic in the spec is what should be supported, we’d start using this for our multiplexing data and happy to have a go at trying to implement this for napari-ome-zarr plate & well reading. But if the thinking on this topic has changed, I’d be curious to join discussions on how to save multiple acquisitions to a Zarr file.

jluethi avatar Sep 08 '22 09:09 jluethi

Hi - when you say...

When loading the images individually into napari with the napari-ome-zarr plugin, I get the desired behavior of all 3 channels of both acquisitions being loaded ⇒ 6 channels & 2 label channels.

I tried to reproduce that with your data and $ napari 2_acquisitions.ome.zarr/B/03/0/ but I only see the 3 image layers and 1 label layer, which is what I'd expect. I don't know anywhere that we've tried to combine separate HCS images by "stacking" them in napari - we've always tended to "stitch" them in X and Y.

There aren't any plans to change the acquisition handling in the NGFF spec, as far as I know. cc @chris-allan @melissalinkert. The change you referred to above was pre-v0.1 release and was removed before the v0.1 HCS spec.

However, I'm not sure of the best way to handle acquisitions in napari. I don't think that loading them all as additional layers (as in your first screenshot) should be the default behaviour for a whole Plate in napari. Maybe a whole Well would be OK, but the amount of data you'd load for a Plate might be too much, since at least the lowest resolution tile for each acquisition of each Well would be loaded initially, even if you never wanted to look at the nth acquisition.

Ideally I think you'd want some other UI widget in napari to move between acquisitions. I haven't looked at napari UI plugins recently, so I don't know if it's possible for a reader plugin to show UI widgets for a particular data format? cc @tlambert03 ?

will-moore avatar Sep 08 '22 13:09 will-moore

I tried to reproduce that with your data and $ napari 2_acquisitions.ome.zarr/B/03/0/ but I only see the 3 image layers and 1 label layer, which is what I'd expect.

What I did is: napari 2_acquisitions.ome.zarr/B/03/0/ Then manually drag & drop 2_acquisitions.ome.zarr/B/03/1/ into the viewer => both files open.

For our usecases of multiplexing, this is actually the representation we'd want. Both acquisitions are of the same location, but we imaged different markers. Thus, it's important to be able to visualize them all together. At some level, what we have are "extra channels" that we want to see, but they are grouped into acquisitions. Thus, "stacking" acquisitions would be the "natural" thing I'd expect if the acquisitions are multiplexing data (or some other form of really imaging the same location different times). Stitching in x & y makes sense to me if we have multiple images in the same acquisition which are actually just different field of views of the microscope. We avoid doing this with https://github.com/ome/ngff/pull/137 (=> all of our wells consist of a single, large image for each acquisition).

Thus, my thinking would be that multiple images of the same acquisitions should get tiled, while multiple images of different acquisitions would get stacked. Are there other use-cases for acquisitions where we would expect a different behavior?

I don't think that loading them all as additional layers (as in your first screenshot) should be the default behaviour for a whole Plate in napari.

For us, the alternative we'd consider for multiplexing would just be actually making them separate channels in the same acquisition (=> not using acquisitions), because each channel in each acquisition is a separate channel the user may want to visualize. Regarding performance on a plate level, I think there are 2 points:

  1. Yes, the fewer images need to get loaded for the whole plate, the better the performance => https://github.com/ome/ngff/pull/137 as a suggestion to explicitly allow saving the whole well as a single image pyramid. But given that the acquisition level is above the channels, we would have separate images in the well for each acquisition. But many use-cases will have 1 -5 acquisitions or so, so it doesn't get terribly slower by loading all acquisitions. And if we can't load later acquisitions on a plate level, there isn't much of a point to save them to OME-Zarr that way.
  2. One could consider setting the visibility flag to false by default for most channels if there are many channels => users can toggle on just the channels that they want to see and have an easy way to switch between which combination of 3-4 channels they want to look at for any given moment. => then performance should remain the same for browsing, while the channels UI allows users to pick which channel they actually want to see.

Ideally I think you'd want some other UI widget in napari to move between acquisitions.

Certainly interesting for some use cases, could also be achieved if we could group layers, right? And for some applications, we want to see multiple acquisitions at once => If possible, I'd have a strong preference for using the napari layer UI to allow channel selection and the lazy loading (which may be off for most channels by default) to then access the data.

I think there is a place for a OME-Zarr reader plugin with widgets (I'm currently toying with a version that can load ROIs from AnnData tables), but it's also very powerful if the standard plugin can give access to the full image data.

jluethi avatar Sep 08 '22 13:09 jluethi

Another option is to use another non-Channel dimension to stack the acquisitions. Although the current NGFF spec is limited to 5 dimensions, there is no such restriction on the data that is passed to napari. Adding an extra dimension would give you an "acquisition" slider to move between acquisitions (similar to a Time or Z slider). This would be suitable if the channels were the same for each acquisition, but this is not true in your use-case. It might also be expected that the images for each acquisition have the same TCZYX shape, although it could be possible to work around shape mismatches when stacking them. Off the top of my head, this seems to be the best way to support multiple acquisitions in napari (and possibly other viewers).

But in your case, if seems that you really have multi-channel images where the channels just happen to be acquired at different rounds of imaging. It probably makes more sense to store these as extra channels. In this case then other tools (e.g. vizarr) would also display them as you'd want.

will-moore avatar Sep 08 '22 14:09 will-moore

But in your case, if seems that you really have multi-channel images where the channels just happen to be acquired at different rounds of imaging. It probably makes more sense to store these as extra channels. In this case then other tools (e.g. vizarr) would also display them as you'd want.

Hmm, displaying them as many channels of the same acquisition certainly would be the fastest & easiest way to start displaying those channels in napari. But we will need to store metadata about the acquisition round for each channel and we will need the ability to process data per acquisition. I fear that if we go down the channel route, we're building a second, redundant way of storing "acquisition"/"cycle" metadata. And given that we're using this for an image processing platform we're building to encourage people to use OME-NGFFs, I'm quite hesitant to deviate from the spec if not fully necessary.

If the acquisition part of the OME-NGFF really isn't meant for multiplexing data, then it makes sense to me that we'd invent something new. But my impression is that the acquisition structure in OME-NGFF actually fits the model for multiplexing data very well.

Thus, it would be great if we can find a way to use that structure and find a way to also visualize it appropriately. Is there a place where there is an overview of the different use-cases for how to visualize acquisitions in napari? I don't fully understand all the different use-cases yet that would use the acquisition approach. If there are examples were my suggestion below would be bad, let me know!

Use-cases I would be aware of for acquisitions in the HCS context:

  1. Multiplexing techniques (=> many channels in different acquisitions)
  2. Combination of imaging modalities (e.g. one acquisition is EM data at 1 resolution, the other IF data at a different resolution)
  3. Imaging the exact same thing multiple times (I haven't seen a reason why one would do this in an HCS context)

Here is a suggestion of how the visualization could be handled and if that's something that's acceptable for ome-zarr-py / napari-ome-zarr, happy to start working on a PR to implement this:

  • The plate loader of ome-zarr-py would load all channels of all acquisitions, but default to only setting the first acquisition to visible (I guess that would be logic in napari-ome-zarr). In that way, all image data is accessible in the interface, but performance should stay the same as so far if one only wants to see the first acquisiton.
  • [Alternatively to hard defaults on visibility, one could also rely on metadata about whether a channel should be shown. Or respect that metadata if it's present and otherwise default to only showing the first acquisition]
  • The images for each acquisition are placed in the same coordinate system, but each acquisition could define its own transformation metadata that would be respected. That way, there would be a clear approach if one wants acquisitions to appear next to each other (define an appropriate transformation) or on top of each other (no transformation necessary if it's the same scale, correct scale information necessary if we combine different imaging modalities).
  • If napari adds this, one could eventually group channels belonging to the same acquisition into a layer group: https://github.com/napari/napari/issues/970 But even a flat hierarchy would be quite useful for intermediate numbers of acquisitions (we've used flat acquisition hierarchies for dozens of channels in the past in our custom viewer without issues)

@will-moore Do you foresee uses of the acquisitions spec where such an approach in ome-zarr-py would be detrimental? Comparing it to the idea of stacking acquisitions as another dimension: If we can get to layer groups in napari, it would be as easy to switch between acquisitions, without the assumption that the order of acquisition needs to contain specific meaning & with the ability to expose things like names for the acquisition & the channels. Stacking appears to me like an approach that handles a subset of the use-cases. But maybe I'm not aware of important uses-cases

jluethi avatar Sep 12 '22 06:09 jluethi

That actually sounds pretty good. I don't know of many acquisitions use-cases, but I think what you're suggesting makes sense.

Just thinking about how to implement these proposed changes...

For a napari reader, you could return a single LayerData tuple (as we do now) with the acquisitions stacked along the C dimension. There is also the option of returning multiple LayerData tuples. https://napari.org/stable/plugins/guides.html#readers

However, probably a tougher question is what ome-zarr-py returns for a Plate with acquisitions. Currently the reader traversal of a NGFF plate and labels returns a node for the Plate and a node for the Labels. You probably don't want to change that, so it means that the Plate.data needs to represent all your acquisitions.

I imagine you could add another for-loop in https://github.com/ome/ome-zarr-py/blob/13165f410c209cc4e50e81b9ac0ce850c367a9b5/ome_zarr/reader.py#L558 to iterate through the acquisitions and concatenate them on the C axis. The tile_name for get_tile() could include the acquisition - NB: the comment """tile_name is 'level,z,c,t,row,col'""" is out-dated as it only row, col now.

Currently, it is assumed that the data loaded for each Well data = self.zarr.load(path) is the same or compatible shape and can be concatenated together. If you have different shapes for different acquisitions, you'll need to check that this is handled. I guess a different size-C might not be an issue if you are stacking along the C axis, but X, Y, Z or T might be.

I hope I haven't missed any gotchas here... Might be good to get a 👍 from @joshmoore and/or @sbesson before you start on this approach, in case of any vetoes??

will-moore avatar Sep 14 '22 15:09 will-moore

Thanks for starting the discussion. I think https://github.com/ome/ome-zarr-py/issues/225#issuecomment-1243286934 does a good job of enumerating the main use cases. Adding a few other variants to the list:

Combination of imaging modalities (e.g. one acquisition is EM data at 1 resolution, the other IF data at a different resolution)

Another example of this multi-modal use case would be bright-field/fluorescence.

Imaging the exact same thing multiple times (I haven't seen a reason why one would do this in an HCS context)

In a similar vein, the other case I have seen is when multiple acquisitions effectively capture multiple fields of views for each well. The plates of idr0001 (https://idr.openmicroscopy.org/webclient/?show=plate-2551) are an example of this approach and have been converted to OME-NGFF - see https://idr.github.io/ome-ngff-samples/.

Reading briefly through the proposal, I support the idea to update the reader to be able to access all acquisitions. Acquisitions is only metadata allowing to annotate images within a well and group them but there is no reason to exclude a subset of them. I also agree it is reasonable to assume different acquisitions within the same well should be represented in the same coordinate space.

The biggest worry is to converge towards a single viewer implementation allowing to resolve all the scenarios described. Part of the discussion above around concatenating channels works nicely with the multiplexing use case but how would this work in the idr0001 scenario mentioned above or in the multi-modal use case?

sbesson avatar Sep 22 '22 09:09 sbesson

Thanks for the comments and additional usage example of acquisitions @sbesson !

The biggest worry is to converge towards a single viewer implementation allowing to resolve all the scenarios described. Part of the discussion above around concatenating channels works nicely with the multiplexing use case but how would this work in the idr0001 scenario mentioned above or in the multi-modal use case?

I would argue that acquisition metadata should be used for images that can be placed in the same coordinate system.

I would think the multi-modal case should work well with the proposed approach of loading everything into the same coordinate space. If the right transformations & scales are applied, then they should be overlayed correctly. If no metadata is defined, then there is obviously no way to display them correctly (unless the default overlay happens to be correct).

The idr0001 example is a bit different. I can't fully follow what you describe in the omero viewer (where would I see acquisitions metadata?). But based in your description, that seems like a use of acquisitions that I wouldn't be expecting. Aren't multiple field of views just be supposed to be saved either as field of view (/images => https://github.com/ome/ngff/pull/137) within the same well? The way I would understand the acquisition metadata, they would all belong to the same acquisition. Loading multiple field of views/images of the same well also isn't currently supported in napari-ome-zarr (https://github.com/ome/ome-zarr-py/issues/200) & not really great performance-wise, but that is a separate story. Thus, I would argue idr0001 should probably store the different field of views as images of the same acquisition. If that's not possible, I'd argue that transformations should be defined for each field of view that place them correctly.

If the idr0001 can't be avoided and no metadata is adjusted, I still think my proposal for loading the data would be better than the current handling. At the moment, multiple acquisitions are just ignored. In my proposal, they would all be loaded into the same coordinate space as separate channels. Thus, a user could switch on & off the "channels" (in this case = field of views) of interest and always see the field of view they toggled on in the same position.

Part of the discussion above around concatenating channels

To avoid misunderstandings, I would not be suggesting to combine channels here, but to display every acquisition as its own channel (fitting very well for multiplexing & multi-modality datasets, while being something that can display idr0001, even if it's not in an optimal way.

jluethi avatar Sep 23 '22 08:09 jluethi

I would argue that acquisition metadata should be used for images that can be placed in the same coordinate system.

As it is defined in the current OME-NGFF specification, acquisitions is a plate-level concept. The specification is largely derived from the OME data model which includes the following concept:

  • a Well contains a collection of WellSample
  • a WellSample is defined as a single Image captured within a well i.e. it is within the well coordinate framework and its position can be specified by the PositionX/PositionY attributes
  • a PlateAcquisition is a Plate-level concept allowing to group WellSamples belonging to different Wells that were acquired as part of the same run by the instrument.

The OME-NGFF mirrors this representation with a few modifications:

  • the concept of WellSample is dropped and the relationship is directly between the well (Zarr group implementing the well spec) and images (Zarr groups implementing the multiscales spec)
  • there is no positional metadata defined at the well level and the most relevant place to store such information would be the coordinateTransformations of each multiscales image
  • acquisitions are still defined as the level of the plate spec are refered by id at the level of each image element in the well spec

The idr0001 example is a bit different. I can't fully follow what you describe in the omero viewer (where would I see acquisitions metadata?). But based in your description, that seems like a use of acquisitions that I wouldn't be expecting.

There is an element of confusion here because for historical reasons, the OMERO viewer splits each plate acquisition in a separate virtual containers. These can be seen and selected under the Plate container in the left-hand panel.For NGFF and napari, I do not think adhering to this constraint is a requirement (or even desirable)

Aren't multiple field of views just be supposed to be saved either as field of view (/images => ome/ngff#137) within the same well? The way I would understand the acquisition metadata, they would all belong to the same acquisition.

In the case of idr0001, each well has 6 fields of views saved as images and each of them being associated with a different acquisition run:

% curl https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0001A/2551.zarr/A/1/.zattrs
{
    "well": {
        "images": [
            {
                "acquisition": 2661,
                "path": "0"
            },
            {
                "acquisition": 2652,
                "path": "1"
            },
            {
                "acquisition": 2653,
                "path": "2"
            },
            {
                "acquisition": 2654,
                "path": "3"
            },
            {
                "acquisition": 2651,
                "path": "4"
            },
            {
                "acquisition": 2655,
                "path": "5"
            }
        ],
        "version": "0.4"
    }
}

Loading multiple field of views/images of the same well also isn't currently supported in napari-ome-zarr (#200) & not really great performance-wise, but that is a separate story. Thus, I would argue idr0001 should probably store the different field of views as images of the same acquisition. If that's not possible, I'd argue that transformations should be defined for each field of view that place them correctly.

I think we are discussing is a different acquisition concept that is not defined in the current specification and thus I disagree with your first statement. But I strongly second your second statement about storing coordinate transformations when it relevant. Unfortunately in the case of idr0001, the positional metadata is missing from the database in the first place so this probably points at a better representative example.

If the idr0001 can't be avoided and no metadata is adjusted, I still think my proposal for loading the data would be better than the current handling. At the moment, multiple acquisitions are just ignored. In my proposal, they would all be loaded into the same coordinate space as separate channels. Thus, a user could switch on & off the "channels" (in this case = field of views) of interest and always see the field of view they toggled on in the same position.

I wonder whether the terminology is a bit misleading here. Although idr0001 highlights an issue with plates with multiple acquisitions, my understanding is that this limitation extends to any plate with multiple images per well. Currently, the plate-level view simply selects one image for each well. Taking another example from https://idr.github.io/ome-ngff-samples/ without acquisitions, the plate generated from idr0056 and available at https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0056B/7361.zarr has 384 plates with 30 fields of view each. Opening the plate in napari displays 384 stitched images

Screenshot 2022-09-23 at 11 40 36

Loading a well in napari displays 30 stitched images

Screenshot 2022-09-23 at 11 40 55

Definitely agree there is a known limitation as only a subset of multi-FOVs are being displayed. I am concerned that using separate layers is a scalable approach in scenarios like the above where you have 4 channels and 30 fields of views / well.

sbesson avatar Sep 23 '22 12:09 sbesson

Thanks a lot for this context & for taking the time @sbesson ! I do believe there are a few different questions here. I tried to summarize them here, as well as a summary of my response to them. Details below.

1) What are acquisitions and are they used consistently?

Acquisition = run of the microscope. Mostly used correctly, but I believe idr0001 is not following that logic correctly. Used correctly, acquisitions should only come up in scenarios like multiplexing or multi-modality setups, not in "simple" cases of multi field-of-view / multi-FOV datasets

2) How do we load multi field-of-view / multi-FOV datasets on the plate level independent of the acquisition topic?

Separate topic, see e.g. discussions in https://github.com/ome/ome-zarr-py/issues/200. I don't believe the multi-FOV use case scales for plate visualization. But using a single, fused FOV per well, we could scale the same size of a dataset and even load multiple acquisitions.

3) What should we now do we acquisitions and multi-FOV datasets?

In my opinion, there's not much we can do for multi-FOV dataset full plate visualization that scales. Our solution is to use fused FOVs as a single image per well & channel. This scales. I would suggest implementing acquisition parsing in such a way that it works like the current setup, i.e. if there are multi-FOVs for a given channel & acquisition combo, only load the first FOV. Unless someone figures out something new here. And if one wants to be able to see all FOVs for a full plate, use a fusion workflow that gives single-FOV per channel & acquisition.


Details that lead me to that conclusion:

Part 1)

a PlateAcquisition is a Plate-level concept allowing to group WellSamples belonging to different Wells that were acquired as part of the same run by the instrument.

So we could say that acquisition = run of the microscope, right? => According to that definition, if we have multi-FOV wells, they should all belong to the same acquisition. Unless a user literally imaged 1 position per well on each run of the microscope (which would be very weird behavior and I doubt the precision of most stages would make that a good imaging approach). In most cases, one would image all the different FOVs in a well in the same acquisition, right?

And I think the multiplexing use-case actually follows this definition very well: We acquire 1-n FOVs in a well once for the first round (=> acquisition 1), then we acquire them again for the second round with different markers (=> acquisition 2).

In the case of idr0001, each well has 6 fields of views saved as images and each of them being associated with a different acquisition run:

I don't know the actual acquisition settings on the microscope, but I highly doubt each field of view was acquired in a separate run by the microscope. That would be a terribly inefficient way to image a multi-well plate! My suspicion here would be that idr0001 is using the acquisition parameter "wrong" (the acquisitions don't represent a run of the microscope).

Part 2)

my understanding is that this limitation extends to any plate with multiple images per well

Yes, though I think that's a bit of a separate discussion. We originally thought that the best idea would be to store FOVs in that way and load all of the FOVs for a given well or for the full plate. On a well-level, that can still be loaded as you show above. But for a full plate like idr0056, this scales really badly (see our long discussion from June in https://github.com/ome/ome-zarr-py/issues/200). The problem here being that for the idr0056 plate, one would need to load 384*30 tiny pyramid tiles for the low resolution overviews. And that for each channel. Loading > 10'000 tiny files has a way worse performance than loading 384 pyramids for the whole well (it can be orders of magnitude depending on pyramid levels & file numbers, see some numbers here: https://github.com/ome/ome-zarr-py/issues/200#issuecomment-1170846459)

Thus, we came to the conclusion that the only scalable way to save wells with many FOVs was to combine the FOVs into a single array and save that as a single image for each channel. See PR to the spec for this here: https://github.com/ome/ngff/pull/137

How we should handle multi-FOV plates in the same acquisition (like idr0056) is a separate discussion, but my current conclusion is that with such an on-disk representation, that will never scale very gracefully.

Part 3)

Back to the discussion about acquisitions: I am not proposing that we use acquisitions as a way to load multi-FOV use-cases. If someone put acquisitions on top of there multi-FOV use case like in idr0001, I believe that was probably not the correct use of acquisitions and thus performance would not be outstanding. But for the cases where acquisitions are used for multiple runs of a microscope, e.g. multiplexing or multi-modality imaging, the number of channels should scale much more gracefully. And with the proposed way of loading acquisitions above, we could actually load them with ome-zarr-py. What we do we multi-FOV uses that are all within the same acquisition (like I'd normally expect them to be): That's a separate question and I don't know whether there will be a good answer to that.

PS: I can't access idr0056, I get an NoSuchKey error message. But I did assume now that idr0056 used acquisitions = run of the microscope => all the FOVs are of the same acquisition.

jluethi avatar Sep 23 '22 13:09 jluethi

So we could say that acquisition = run of the microscope, right? => According to that definition, if we have multi-FOV wells, they should all belong to the same acquisition. Unless a user literally imaged 1 position per well on each run of the microscope (which would be very weird behavior and I doubt the precision of most stages would make that a good imaging approach). In most cases, one would image all the different FOVs in a well in the same acquisition, right?

At least, for idr0001, I'll literally quote the relevant part of the materials and methods in the paper:

Imaging used the OperaLX high-throughput microscope system (spinning disk confocal microscope, PerkinElmer, USA) with
a fully automated 1.2 NA 60x water immersion objective, CCD camera, and IR autofocus. Settings for the screen were: 96
well plate-layout, one field per well, z-stack (16 planes, 0.4 μm separation). Using exposure channels green (488 nm, 120
ms, full power (~5000 μW)) and blue (405nm, 80 ms, full power (~1000 μW)). Each plate was filmed automatically six times
(using the Remote Control Scheduler, PerkinElmer) to obtain six different image fields for each well, specifically 5 central
positions plus one rim position to ensure overall a usable density of cells for each well/strain.

Looking at the structure of the raw data which can be downloaded via Aspera, for each plate, there are 6 measurement folders which are mapped into acquisition runs which image the same wells (A1 == 001001001) at different positions:

[sbesson@prod110-omeroreadwrite JL_120731_S6A]$ find . -iname 001001001.flex | sort
./Meas_01(2012-07-31_10-41-12)/001001001.flex
./Meas_02(2012-07-31_11-56-41)/001001001.flex
./Meas_03(2012-07-31_13-12-10)/001001001.flex
./Meas_04(2012-07-31_14-27-40)/001001001.flex
./Meas_05(2012-07-31_15-43-11)/001001001.flex
./Meas_06(2012-07-31_16-58-39)/001001001.flex

Without debating the efficiency of the approach, there are (public) examples of data generated using this modality. I believe the existence of such data is also why the OME concept of plate acquisition has been kept loosely defined as "a group of well samples".

PS: I can't access idr0056, I get an NoSuchKey error message. But I did assume now that idr0056 used acquisitions = run of the microscope => all the FOVs are of the same acquisition.

You'll probably need to access sub-paths like https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0056B/7361.zarr/.zattrs. Alternatively, you can point ome_zarr info at the top-level folder :). But minimally, you are right that there is no acquisitions metadata defined in this case. Implicitly, this can be considered as 1 run of the microscope and this is indeed an example of all FOVs belonging to the same acquisition.

sbesson avatar Sep 23 '22 14:09 sbesson

🤯🤯🤯

Wow! I guess that should teach me about making any assumptions on the type of microscopy acquisitions people perform. Thanks for citing the actual details of this, would not have thought this possible if I didn't see it here!

In that case, acquisition = run of the microscope is correct. But how people run their microscope can vary greatly...

But minimally, you are right that there is no acquisitions metadata defined in this case. Implicitly, this can be considered as 1 run of the microscope and this is indeed an example of all FOVs belonging to the same acquisition.

I would assume this is by far the most common case, a single acquisition run to image a whole plate once.


For me, the question then becomes what we want to do when loading acquisitions for viewing. After your example above, I'm not sure anymore whether there is a single framework for loading that will scale optimally for all ways of doing acquisitions.

Currently, ome-zarr-py does not do anything with the acquisition metadata (and, due to performance concerns, only loads a single image/FOV per well). What I'm proposing would be a way to load acquisitions for multiple common ways of having acquisitions (multi-plexing, multi-modality). We could make this optional, e.g. have a flag that can be turned on or off for acquisition loading? I imagine in an optimal use case, idr0001 would be loaded like idr0056, though we also don't have a solution to load that full dataset (all FOVs) at scale in a performant manner (for the full plate).

jluethi avatar Sep 23 '22 15:09 jluethi

In that case, acquisition = run of the microscope is correct. But how people run their microscope can vary greatly...

Agreed and based upon OME's experience of dealing with formats and data, heterogeneity is simply part of the reality. As you said, different acquisition choices will have implications in terms of performance and/or visualization and it is also valuable to identify and discuss recommended layouts for different modalities.

For me, the question then becomes what we want to do when loading acquisitions for viewing. After your example above, I'm not sure anymore whether there is a single framework for loading that will scale optimally for all ways of doing acquisitions. We could make this optional, e.g. have a flag that can be turned on or off for acquisition loading?

From my side, an API allowing to control the layout used for loading plate data would be an interesting approach to support different scenarios. Another benefit of this idea would be its extensibility so that different consumers could specify the way data need to be loaded for their application.

sbesson avatar Sep 26 '22 12:09 sbesson

Thanks a lot for the discussion!

In that case, I'll start looking into creating an implementation (draft) for how this scenario could be loaded best. Will keep these ideas about the heterogeneity & extensibility in mind :) It will take a bit of time to get this done, but hopefully we can come up with a PR that will then be a more concrete suggestion for this.

jluethi avatar Sep 26 '22 13:09 jluethi