ngff
ngff copied to clipboard
NGFF dataset validator
A tool to validate whether a dataset follows the NGFF spec. Per-version validation. Generate a visual and programmatic summary required and optional features and any errors related to types, etc.
Related: https://github.com/ome/ome-zarr-py/issues/102 Also, @joshmoore is considering json-ld to define the NGFF spec, to have a static definition that could be used for language independent validation.
In discussion with @joshmoore and @jburel... (e.g. see https://github.com/ome/ngff/issues/31#issuecomment-947668553) It seems there's 2 types of validation that we're going to need:
- JSON validation - check that correct types and attributes are present etc. Hopefully this can be achieved with a schema and existing validation tools.
- Validation against the Zarr arrays. E.g. check that "datasets" are ordered from largest to smallest, check that the "axes" list is the same length as the array dimensions etc. This will likely need custom validation code.
For the JSON validation, started looking at https://www.commonwl.org/v1.2/SchemaSalad.html see https://github.com/ome/ngff/pull/69
cc @glyg
I can start looking on the 2nd aspect by coding something in python
@glyg That would be great, thanks!
We imagine that an ome_zarr validate
command would work in a similar way to the info
command. In due course the info
command could include validation (see https://github.com/ome/ome-zarr-py/issues/102).
We have a json schema now thanks to efforts by @will-moore and @sbesson: https://github.com/ome/ngff/tree/main/0.4/schemas Usage examples will follow.
@will-moore @sbesson thanks for your good work on the validation! :pray: :clap:
Following:
https://github.com/ome/ngff/blob/8dec6918ee6630f43f339b281b66043e0a797ca4/0.4/schemas/image.schema#L3
it looks like the *.schema
files are intended to be published to gh-pages
? This is still todo?
@thewtex you are right, the schemas are currently living in the GitHub repository alongside the samples but there are not published to the gh-pages
yet. There were several considerations around the URL naming in the original thread (https://github.com/ome/ngff/pull/76#pullrequestreview-820304335) and the publication step was deferred to a round of review (https://github.com/ome/ngff/pull/76#issuecomment-992623979) but coming back to it has not been captured.
I don't know if we want to (ab-)use this issue or create a separate issue to go over the current URL proposal and make sure we are all happy with the decisions.
On a related note, there is also ongoing work on making these schemas available as artifacts so that downstream tools could bundle them and use them for validation e.g. when working offline or simply for performance reasons - see https://github.com/ome/ngff/pull/77. So far, most of the work has been driven by the Python drivers but there are also design decisions to be made that should be fully language agnostic? Would you have some use case for caching these schemas and using them for validation and would that mandate some particular constraints in terms of layout or distribution?
@sbesson thanks for the information!
Yes, I am looking to validate in Python, JavaScript, C++, so both the http and package distribution like #77 would be helpful. #77 looks good to me. I will fetch from the GitHub repository for now and report what works well.
As a follow-up, I just opened https://github.com/ome/spec-prod/pull/2 to update the logic to publish the JSON schemas. Now is probably a good time for anyone to suggest alternate permanent URLs for these schemas before we start deploying them to the gh-pages
branch.
I assume we'll also want the existing schemas to be listed from https://ngff.openmicroscopy.org/latest/ maybe as a separate sections? or within each specification section?