ome-zarr-py icon indicating copy to clipboard operation
ome-zarr-py copied to clipboard

use declarative representation of the spec

Open d-v-b opened this issue 3 years ago • 14 comments

I ran into issues viewing my ome-ngff-formatted zarr data using via the napari plugin. Napari was exiting with an error emitted from this line of this package, which is a try...except block that takes ANY exception, and doesn't handle the content of the exception at all.

Unfortunately, swallowing exceptions without a useful error message made it very hard to debug any problems with my data. I suspect this style of coding is a side-effect of taking a procedural approach to the OME-NGFF metadata (i.e., "valid metadata" means "a bunch of code ran without errors") I think a much better approach would be to define a python class that represents the structure of the metadata and handles parsing / validation of the input. Such a declarative approach (i.e., "valid metadata" means "the program created an object structurally equivalent to the metadata") can produce useful error messages when a single field fails to validate (in my case, the problem was with the version string, but I had to figure that out manually)

For examples of declarative implementations of the OME-NGFF spec, see iohub and pydantic-ome-ngff. As the author of the second project, I would love to see this functionality under the umbrella of the official OME organization.

d-v-b avatar Mar 07 '23 20:03 d-v-b

Hi @d-v-b. Thanks for opening this! I'll reiterate the :+1: from my gitter conversation with you as well as the :+1::heart: from the Zarr community meeting last night, and additionally a :100: to everyone in the community that we find ways to work together on fewer libraries rather than everyone needing to roll their own. (Maybe we need a community meeting ... per language? ... just on that topic.)

I haven't gone through the code in detail but I know we need it. The Java crew that's met at a several hackathons similarly started pushing to https://github.com/ome/ome-ngff-java, so without putting too much thought into the completeness of it, here's a list of TODOs that I'd suggest:

  • [x] Fix or capture the exception handling mentioned above :smile: #266
  • [ ] Transfer the repo to @joshmoore (Davis)
    • [ ] who transfers to @ome (Josh)
    • [ ] or say if you'd prefer a new repo be created (Davis/Josh)
    • cf. https://github.com/JaneliaSciComp/pydantic-ome-ngff/issues/3
  • [ ] Reach out to iohub about possible overlap
  • [ ] Consider renaming to ome-ngff-py (i.e. the base model in each language)
  • [ ] Document these core libraries on ngff.o.org or similar
  • [ ] Update ome-zarr-py etc. to use the library
  • [ ] Request feedback on the API, etc.
  • [ ] Work-toward a first stable release

joshmoore avatar Mar 09 '23 07:03 joshmoore

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/napari-napari-ome-zarr-plugin-is-not-registered/78482/3

imagesc-bot avatar Mar 13 '23 21:03 imagesc-bot

Late to the discussion, but happy to move this code upstream for v0.4 too!

ziw-liu avatar Mar 20 '23 22:03 ziw-liu

Just starting to look at this again, thinking how ome-zarr-py might use pydantic-ome-ngff...

Not being familiar with pydantic yet... it looks like the easiest way to validate when reading an NGFF image is to parse the json?

from ome_zarr.io import parse_url
from ome_zarr.reader import Reader

from pydantic_ome_ngff.v04.multiscales import Multiscale
from pydantic import ValidationError

reader = Reader(parse_url("6001240.zarr"))
nodes = list(reader())
# first node will be the image pixel data
image_node = nodes[0]
ngff_json = image_node.zarr.root_attrs
try:
    Multiscale.parse_obj(ngff_json["multiscales"][0])
except ValidationError as e:
    print(f"Error found", e.json())

This approach keeps all the pydantic models separate from the ome-zarr-py classes.

But I'm wondering whether we should be thinking about combining them. E.g. https://github.com/ome/ome-zarr-py/blob/41bd4431a4b42a16120e2b31802725fa0f4d00e3/ome_zarr/reader.py#L268 and https://github.com/JaneliaSciComp/pydantic-ome-ngff/blob/176e4521acb5b0e4e9d19f3e15e72fedf87789e4/src/pydantic_ome_ngff/v04/multiscales.py#L67 seem like they're kinda equivalent, but I've not looked too closely at how that might work...?

will-moore avatar Apr 21 '23 13:04 will-moore

This approach keeps all the pydantic models separate from the ome-zarr-py classes.

But I'm wondering whether we should be thinking about combining them. E.g.

If combining them means that the class for array I/O is a child of a pydantic model class, I think it would generally make things cumbersome, since pydantic models are not designed to be stateful.

Edit: not

ziw-liu avatar Apr 21 '23 17:04 ziw-liu

+1 for consolidating models and schema for ome-ngff with pydantic, both https://github.com/JaneliaSciComp/pydantic-ome-ngff and https://github.com/czbiohub/iohub/blob/main/iohub/ngff_meta.py are really great efforts!

giovp avatar Apr 27 '23 13:04 giovp

If combining them means that the class for array I/O is a child of a pydantic model class, I think it would generally make things cumbersome, since pydantic models are not designed to be stateful.

The reader/writer class could compose over the pydantic class (e.g. at a .metadata instance variable), or pydantic could be used for validating IO and then the user-facing class could be constructed from pydantic models under the hood.

clbarnes avatar Jul 09 '23 14:07 clbarnes

cool kidz are also drinking some https://linkml.io coolaid these days as even higher level than pydantic semantic description of the model, and then producing pydantic and whatnot serializations/exports.

yarikoptic avatar Feb 21 '24 15:02 yarikoptic

@yarikoptic can you link to a repo of said cool kidz doing this? Because while I have heard a lot about linkml from a "this looks cool" perspective, I have seen very little code, and the code I have seen looked like it was doing quite a bit more than we need.

d-v-b avatar Feb 21 '24 15:02 d-v-b

@d-v-b, that's one use of linkml as opposed to anything core. Calling out some names that I know of:

  • @glyg
  • @mrmh2
  • @melonora
  • @satra

I think we've discussed it elsewhere (Zürich?) but in general I don't mind the starting point of our pipeline being pydantic, but I very much think we need to be careful not to overfit to it because we will need to transform/translate out and into other representations. LinkML does that quite well. And for one of the representations I'm personally interested in (JSON-LD), it's likely one of the best, though I admit that's less critical for the core NGFF types than for additional metadata that I would like to record with/in the NGFF.

joshmoore avatar Feb 21 '24 17:02 joshmoore

I think we've discussed it elsewhere (Zürich?) but in general I don't mind the starting point of our pipeline being pydantic, but I very much think we need to be careful not to overfit to it because we will need to transform/translate out and into other representations.

To be clear, the spec is JSON, so modeling the spec can be done by any python library that makes it easy to define JSON-serializable classes. With that constraint in mind, I think there's no risk that modeling the spec with pydantic, or dataclasses, or attrs, or marshmellow, would lead to any overfitting. On the other hand, not modeling the spec with one of the above libraries (or something functionally equivalent) in the reference python implementation of the spec generates friction for users and developers.

d-v-b avatar Feb 21 '24 17:02 d-v-b

Sorry, I likely have mixed a few discussions. :+1: for a declarative representation at the core, I think we're clearly agreeing there, and then we can see how things shake out with pydantic from there.

joshmoore avatar Feb 21 '24 17:02 joshmoore

@joshmoore - just a note there was more work done during the workshop which brings us closer to defining the entire spec as linkml (https://github.com/linkml/linkml-arrays), which can in turn generate json, jsonld, etc.,.

satra avatar Feb 21 '24 18:02 satra

@joshmoore - just a note there was more work done during the workshop which brings us closer to defining the entire spec as linkml (https://github.com/linkml/linkml-arrays), which can in turn generate json, jsonld, etc.,.

and work ongoing on named arrays:)

melonora avatar Feb 21 '24 19:02 melonora