ngff icon indicating copy to clipboard operation
ngff copied to clipboard

List sources of sample data

Open joshmoore opened this issue 2 years ago • 8 comments

@folterj was looking for sample datasets and came across three (unsuitable) locations:

  • https://github.com/ome/ome-zarr-py/tree/master/tests/data (dataless to keep the small)
  • https://www.openmicroscopy.org/2020/11/04/zarr-data.html (older version)
  • A hackmd where we are preparing just this list :smile:

The spec page itself should be the definitive starting point (for now) to find sample data.

joshmoore avatar Sep 30 '22 08:09 joshmoore

Hi @joshmoore I didn't realise it's possible to open a Zarr by URL and after updating packages (in particular the obscure 'aiohttp') this is working fine. It may be nice to have some smaller samples as downloadable archives, but I understand the whole point of distributed/Zarr is the online access.

folterj avatar Sep 30 '22 10:09 folterj

@folterj: so you would want Zips for a single download, rather than needing to use aws or mc to download from S3? Btw, ome-zarr-py provides a download method, but dedicated tools are more scalable.

joshmoore avatar Sep 30 '22 10:09 joshmoore

@joshmoore yes exactly. By the way thank you also for making me aware of this page including up-to-date v0.4 samples - this is really nicely done! https://idr.github.io/ome-ngff-samples/

folterj avatar Sep 30 '22 11:09 folterj

@joshmoore yes exactly.

Understood. We'll look into putting some more (smaller) samples up on Zenodo, but for now you can see find a handful under:

https://zenodo.org/search?page=1&size=20&q=ngff&access_right=open&type=dataset

joshmoore avatar Sep 30 '22 11:09 joshmoore

Following up on today's OME2022 call: Happy to offer the small example OME-Zarr datasets we use for testing purposes and have put on Zenodo, e.g. this one: https://zenodo.org/record/7144919 It passes the v0.4 ome-ngff-validator. It already contains some tables with measurements and custom ROIs (which we will make v0.5 spec compliant once this spec definition has finished).

Also, we have this tiny dataset that's just 17 / 32 MB (2D vs. 2 planes in 3D) we use in some of our automated testing: https://zenodo.org/record/7274533

jluethi avatar Nov 10 '22 09:11 jluethi

Gathering links following ome2022 call

  • https://idr.github.io/ome-ngff-samples/
  • https://uk1s3.embassy.ebi.ac.uk/bia-zarr-test/bia_examples.html
  • https://s3.embl.de/i2k-2020/platy-raw.ome.zarr (https://s3.embl.de/i2k-2020)
  • https://webknossos.org/datasets/scalable_minds/l4dense_motta_et_al_demo#2924,4474,1770,0,3.4
  • Links to consider in https://hackmd.io/@ome/HJq5cSNV5

jburel avatar Nov 10 '22 11:11 jburel

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-ngff-test-dataset/74436/5

imagesc-bot avatar Nov 29 '22 11:11 imagesc-bot

Discussing with @erindiel, @jburel, and @pwalczysko today regarding the upcoming publication, there's a general sense that along with ngff.openmicroscopy.org we can maintain a single page "data resources" page that then points to:

  • known collections of actual datasets (like https://idr.github.io/ome-ngff-samples)
  • lists of repositories tagged with a given topic (e.g. "ome-zarr-catalog")

A similar strategy is likely to be followed for a top-level landing page for "tool resources" which then in turn links the tools (as https://ome.github.io/ome-ngff-tools does) as well as discovered repositories.

joshmoore avatar Jan 20 '23 16:01 joshmoore