ngff icon indicating copy to clipboard operation
ngff copied to clipboard

NGFF dataset validator

Open thewtex opened this issue 2 years ago • 9 comments

A tool to validate whether a dataset follows the NGFF spec. Per-version validation. Generate a visual and programmatic summary required and optional features and any errors related to types, etc.

thewtex avatar Sep 07 '21 20:09 thewtex

Related: https://github.com/ome/ome-zarr-py/issues/102 Also, @joshmoore is considering json-ld to define the NGFF spec, to have a static definition that could be used for language independent validation.

constantinpape avatar Sep 08 '21 07:09 constantinpape

In discussion with @joshmoore and @jburel... (e.g. see https://github.com/ome/ngff/issues/31#issuecomment-947668553) It seems there's 2 types of validation that we're going to need:

  • JSON validation - check that correct types and attributes are present etc. Hopefully this can be achieved with a schema and existing validation tools.
  • Validation against the Zarr arrays. E.g. check that "datasets" are ordered from largest to smallest, check that the "axes" list is the same length as the array dimensions etc. This will likely need custom validation code.

For the JSON validation, started looking at https://www.commonwl.org/v1.2/SchemaSalad.html see https://github.com/ome/ngff/pull/69

cc @glyg

will-moore avatar Oct 25 '21 16:10 will-moore

I can start looking on the 2nd aspect by coding something in python

glyg avatar Oct 27 '21 07:10 glyg

@glyg That would be great, thanks! We imagine that an ome_zarr validate command would work in a similar way to the info command. In due course the info command could include validation (see https://github.com/ome/ome-zarr-py/issues/102).

will-moore avatar Oct 27 '21 08:10 will-moore

We have a json schema now thanks to efforts by @will-moore and @sbesson: https://github.com/ome/ngff/tree/main/0.4/schemas Usage examples will follow.

constantinpape avatar Feb 02 '22 22:02 constantinpape

@will-moore @sbesson thanks for your good work on the validation! :pray: :clap:

Following:

https://github.com/ome/ngff/blob/8dec6918ee6630f43f339b281b66043e0a797ca4/0.4/schemas/image.schema#L3

it looks like the *.schema files are intended to be published to gh-pages? This is still todo?

thewtex avatar Mar 29 '22 02:03 thewtex

@thewtex you are right, the schemas are currently living in the GitHub repository alongside the samples but there are not published to the gh-pages yet. There were several considerations around the URL naming in the original thread (https://github.com/ome/ngff/pull/76#pullrequestreview-820304335) and the publication step was deferred to a round of review (https://github.com/ome/ngff/pull/76#issuecomment-992623979) but coming back to it has not been captured.

I don't know if we want to (ab-)use this issue or create a separate issue to go over the current URL proposal and make sure we are all happy with the decisions.

On a related note, there is also ongoing work on making these schemas available as artifacts so that downstream tools could bundle them and use them for validation e.g. when working offline or simply for performance reasons - see https://github.com/ome/ngff/pull/77. So far, most of the work has been driven by the Python drivers but there are also design decisions to be made that should be fully language agnostic? Would you have some use case for caching these schemas and using them for validation and would that mandate some particular constraints in terms of layout or distribution?

sbesson avatar Mar 31 '22 13:03 sbesson

@sbesson thanks for the information!

Yes, I am looking to validate in Python, JavaScript, C++, so both the http and package distribution like #77 would be helpful. #77 looks good to me. I will fetch from the GitHub repository for now and report what works well.

thewtex avatar Mar 31 '22 14:03 thewtex

As a follow-up, I just opened https://github.com/ome/spec-prod/pull/2 to update the logic to publish the JSON schemas. Now is probably a good time for anyone to suggest alternate permanent URLs for these schemas before we start deploying them to the gh-pages branch.

I assume we'll also want the existing schemas to be listed from https://ngff.openmicroscopy.org/latest/ maybe as a separate sections? or within each specification section?

sbesson avatar Apr 13 '22 15:04 sbesson