ome-zarr-py icon indicating copy to clipboard operation
ome-zarr-py copied to clipboard

Best practices for generating multiscale zarr data?

Open GenevieveBuckley opened this issue 1 year ago • 5 comments

What is the current best practice for generating & saving a multiscale zarr array, given a single resolution of that data?

I gather things have changed a lot recently with the improvements to OME NGFF, so I feel like I need to ask the question. I've talked to a few people who say they use a python script they or someone else in the lab wrote, but then say it might be a little bit hacky and they're not completely sure if it's compliant with the latest NGFF.

I've looked at the docs, but it hasn't completely clarified things for me. The write_multiscale function seems like the best option, but requires users to have already generated the resolution levels externally (so the question is still, what is the best practice recommendation for that). Worse, write_multiscale appears to only take in a list of numpy arrays, which is a little odd. If I could reliably fit my high resolution data in memory as a numpy array, I wouldn't need to use zarr at all.

The regular function for writing a zarr array seems to have a keyword argument for a downsampling function, but not much information on what that function should be like, or how to use the feature. (Unless I've just missed it, please point me to the right section of the docs if there's more info somewhere!)

GenevieveBuckley avatar Jul 24 '22 08:07 GenevieveBuckley

Hi @GenevieveBuckley, there are convenience functions for also creating the multi-scales in ome_zarr. Here's an example workflow script I wrote to demonstrate the usage:https://github.com/ome/ome-ngff-prototypes/blob/main/workflows/spatial-transcriptomics-example/convert_transcriptomics_data_to_ngff.py#L39-L64 (Though I fully agree that overall this needs to be better documented ...)

Also note that using the local_mean option is currently not working, see #217, but you can e.g. use nearest instead.

(Sorry, closed by accident)

constantinpape avatar Jul 26 '22 14:07 constantinpape

@toloudis / @will-moore: thoughts on the rolling out of (and/or testing of) https://github.com/ome/ome-zarr-py/pull/192 here?

joshmoore avatar Jul 26 '22 15:07 joshmoore

makes sense to me. At best, it will probably lead to improvements and may verify some of the performance issues I was seeing with large data and dask resizing.

It might also be instructive to look at this Pull Request in aicsimageio, building on top of #192: https://github.com/AllenCellModeling/aicsimageio/pull/381 , which includes a ipynb file demonstrating loading a single resolution image and saving a multiresolution zarr. Inside the OmeZarrWriter is the code that forwards the arrays to ome-zarr-py

toloudis avatar Jul 27 '22 00:07 toloudis

:+1: @GenevieveBuckley, just one more minor change on that PR and then I'll get it released. Happy to have some testing either before or after.

joshmoore avatar Aug 03 '22 19:08 joshmoore