ome-zarr-py
ome-zarr-py copied to clipboard
Initial validate support, with --strict option
Work in progress...
I am testing the JSON schemas from ngff repo, downloading them to a cache with https://github.com/allenai/cached_path.
This PR adds a validate
command that validates the JSON at each Node (currently only working for Multiscales spec).
NB: some validation occurs during the init of the Nodes e.g. axes matching xyzct etc. so if that fails then we don't even get to validate against the schemas.
We use --strict
flag to use strict_
schemas to show warnings where strict rules are not met.
cc @glyg
$ ome_zarr validate https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001247.zarr --strict
WARNING:ome_zarr.io:version mismatch: detected:FormatV01, requested:FormatV03
WARNING:ome_zarr.io:version mismatch: detected:FormatV03, requested:FormatV01
WARNING:ome_zarr.io:version mismatch: detected:FormatV01, requested:FormatV03
WARNING:ome_zarr.io:version mismatch: detected:FormatV03, requested:FormatV01
Validating Multiscales spec at https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001247.zarr/ [zgroup]
Using Multiscales schema version 0.1
WARNING 'name' is a required property
WARNING 'type' is a required property
WARNING 'metadata' is a required property
Validating Multiscales spec at https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001247.zarr/labels/masks/ [zgroup]
Using Multiscales schema version 0.1
WARNING 'name' is a required property
WARNING 'type' is a required property
WARNING 'metadata' is a required property
Codecov Report
Merging #142 (c03b0a0) into master (d729a38) will increase coverage by
0.49%
. The diff coverage is95.94%
.
@@ Coverage Diff @@
## master #142 +/- ##
==========================================
+ Coverage 83.90% 84.39% +0.49%
==========================================
Files 12 12
Lines 1379 1442 +63
==========================================
+ Hits 1157 1217 +60
- Misses 222 225 +3
Impacted Files | Coverage Δ | |
---|---|---|
ome_zarr/data.py | 87.00% <ø> (ø) |
|
ome_zarr/utils.py | 90.19% <88.88%> (-0.61%) |
:arrow_down: |
ome_zarr/reader.py | 86.15% <97.77%> (+1.15%) |
:arrow_up: |
ome_zarr/cli.py | 89.77% <100.00%> (+1.16%) |
:arrow_up: |
ome_zarr/format.py | 97.59% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d729a38...c03b0a0. Read the comment docs.
Hi @will-moore thanks for the ping. I'm waiting a bit to advance on the rest of the validation, mainly because I'm not sure of what will be left to validate once you're done :) But maybe I'm wrong and the ticked list here is all there is?
If so I'll try to get that started soon
Trying to capture knowledge from our historical experience with XML and XSD schemas, my understanding is that the semantics of JSON schemas is very similar on this particular issue and only distinguish between required
and optional
attributes. There is no real recommended
concept and one reason for this is that recommended
is loosely defined from a validation perspective. A few examples of open questions:
- assuming a level exists between
required
andoptional
, is this a flat level or would some of the attributes morerecommended
than others? - is the recommended state context dependent i.e. is it the responsibility of the tool/application to make decisions on recommended attributes and report accordingly?
Brainstorming a few ideas:
- from former discussions around minimal requirements, one idea would be to introduce different level of validations e.g. level 0 would be the absolute minimal specification (similar to https://docs.openmicroscopy.org/ome-model/latest/specifications/minimum.html), level 1 would require some attributes marked as
optional
in the base schema, level 2 would require more attributes... - is there some infrastructure that would allow someone to inject additional validation rules? The JSON schema would provide the base validation framework i.e. ensuring the syntax is valid and that the required properties are specified but e.g. some profile would allow to specify additional constraints?
@glyg I wouldn't claim that the list there is complete - It's just what I picked up from the spec, looking for MUST
rules. But not every detail of the spec is spelt out (it's possible to imagine a file that passed all those rules and was still invalid for some other reason).
However, it's also likely that some of those unchecked items are already covered by ome-zarr-py
. E.g. some axes validation at https://github.com/ome/ome-zarr-py/blob/a3f50286bcaebfd1d6f03f8e82c6cb9ee0310393/ome_zarr/reader.py#L287 (when you try to parse any zarr image) and https://github.com/ome/ome-zarr-py/blob/a3f50286bcaebfd1d6f03f8e82c6cb9ee0310393/ome_zarr/writer.py#L17 (currently only used when writing an OME-Zarr).
@sbesson The method I've used here (adding more restrictions to the schema via merging python dicts) seems to work so far and could be extended for various levels. It's still limited by the JSON schema validation, so the logic can't get too complex, but it seems worth exploring in more depth.
@sbesson I've updated the approach to match the allOf
method of combining schemas adopted at https://github.com/ome/ngff/pull/76
Re: required, optional and recommended: The spec uses MUST
(required), SHOULD
(recommended) and MAY
(optional) rules. I don't think this is tool/application specific.
This will fail checks until https://github.com/ome/ngff/pull/77 is merged and released on pypi so that ngff
is pip installable (now a requirement of this PR).
Updated to use the latest naming change to ome_ngff
from https://github.com/ome/ngff/pull/77
$ ome_zarr validate https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.3/idr0079A/9836998.zarr --warnings
WARNING:ome_zarr.io:version mismatch: detected:FormatV03, requested:FormatV04
WARNING:ome_zarr.io:version mismatch: detected:FormatV04, requested:FormatV03
WARNING:ome_zarr.io:version mismatch: detected:FormatV03, requested:FormatV04
WARNING:ome_zarr.io:version mismatch: detected:FormatV04, requested:FormatV03
WARNING:ome_zarr.reader:'metadata' is a required property
WARNING:ome_zarr.reader:'type' is a required property
WARNING:ome_zarr.reader:'name' is a required property
WARNING:ome_zarr.reader:'metadata' is a required property
WARNING:ome_zarr.reader:'type' is a required property
WARNING:ome_zarr.reader:'name' is a required property
Added visit()
as suggested by @constantinpape in c37e116
@satra mentioned that it would be useful to add validation of Zarrs into https://github.com/dandi/dandi-cli (After I pointed out this :smile:)
Thoughts on getting this into a release?
@sbesson It looks like https://ngff.openmicroscopy.org/0.4/schemas/plate.schema is 404
, but this looks like the correct URL? Using https://raw.githubusercontent.com/ome/ngff/main/0.4/schemas/plate.schema for now...
With that last commit, renamed --warnings
to --strict
and added support for validating Wells and Plates.
$ ome_zarr validate https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0056B/7361.zarr --strict
Downloading https://raw.githubusercontent.com/ome/ngff/main/0… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 4.2/4.2 kB
Downloading https://raw.githubusercontent.com/ome/ngff/main/0… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 568/568 bytes
WARNING:ome_zarr.reader:'name' is a required property
WARNING:ome_zarr.reader:'maximumfieldcount' is a required property
Well:
$ ome_zarr validate https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0056B/7361.zarr/A/1/ --strict
Downloading https://raw.githubusercontent.com/ome/ngff/main/0… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 1.2/1.2 kB
Downloading https://raw.githubusercontent.com/ome/ngff/main/0… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 309/309 bytes
WARNING:ome_zarr.reader:'version' is a required property
Thanks: currently, Multiscales
spec has
self.version = multiscales[0].get(
"version", "0.1"
) # should this be matched with Format.version?
whereas elsewhere I've been using CurrentFormat().version
as the default instead of "0.1"
. OK to update that behaviour for Multiscales to use current version as default?
OK to update that behaviour for Multiscales to use current version as default?
At lest from my side, yes. I assume that's the spirit of the comment in https://github.com/ome/ome-zarr-py/blob/6c24f139b7f24a8d2d9b98a53d73b9e1353f0576/ome_zarr/reader.py#L284
I do not see a reason why Multiscales
should not be treated differently from other specs. If the consensus of the library is to default to the latest published specification in the absence of version
metadata, it makes sense to apply it everywhere.
While on the topic of version retrieval, I also noticed another location https://github.com/ome/ome-zarr-py/blob/6c24f139b7f24a8d2d9b98a53d73b9e1353f0576/ome_zarr/format.py#L67-L82 which contract is to retrieve a version and which duplicates a lot of the logic in this PR. Is there a rationale for having this in seprate places or does that point at the need for some form of Spec.get_version()
API that can be consumed both by Spec.validate()
and by Format.matches
?
@sbesson I've used format.get_metadata_version()
(no-longer private method) in the Spec.init()
. I first need to create an instance of the Format class, since this is not a static method. It could be a static method, since we don't currently have a different implementation for any of the Format subclasses, but it's possible that we might want to look-up the version differently for subclasses in the future, so I didn't change it to a static method.
@sbesson I added a --clear_cache
option that removes the cached_path
directory and forces a fresh download.