ngff icon indicating copy to clipboard operation
ngff copied to clipboard

Packages that vendor the JSON Schemas

Open thewtex opened this issue 3 weeks ago • 7 comments

PyPI (python), Maven (java), Crates.io (Rust), and NPM (javascript) packages that contain the JSON schemas. This will allow tools to reference them offline. Derived from the discussion in #58 .

thewtex avatar Dec 12 '25 16:12 thewtex

How about using a git submodule? They're just a collection of static files, which can be a bit awkward to package in certain language repos, and trying to keep many software packages up to date across different ecosystems (= different maintainers, probably) is much harder than just relying on git.

clbarnes avatar Dec 15 '25 11:12 clbarnes

Yes, the package repository/repositories themselves could use git submodules or git subtrees. And every tool maintainer could use git submodules or git subtrees, but this duplicates effort across more software packages and more maintainers.

thewtex avatar Dec 15 '25 12:12 thewtex

these proposed software packages will need to get the JSON schema documents somehow. Let's call whatever method these packages use to get the JSON schema documents X. When is it better for an ome-zarr python package to outsource X to some third-party package, vs doing X itself?

d-v-b avatar Dec 15 '25 13:12 d-v-b

I guess every OME-Zarr implementor is already using some language-specific dependency management tool and isn't necessarily using any git submodules. The question is whether the maintenance burden of managing a new git submodule integration is harder for the OME-Zarr implementors than it is to create and maintain a JSON schema package for every language. If we were to go this way, my strong recommendation would be to do it in a monorepo with all of the packages published by CI, so all of the languages' packages are in sync.

Another question is: how much do we want to encourage relying on JSON Schema-based validation? JSON Schema can't express plenty of the validation rules which the spec requires, which means that a properly validating implementation must do a load of its own validation. The validations JSON Schema can perform are pretty trivial to code (compared to the inexpressible validations); to the point where the complexity of adding JSON Schema to your own validation layer may be more complicated than just hand-writing the same rules. Because JSON Schema references are URIs, not URLs, resolving them is non-trivial.

clbarnes avatar Dec 15 '25 13:12 clbarnes

The idea is to ship them in a stable, documented location inside the package, and also expose them through a canonical programmatic API.

This is (AI-generated):

Python

  • Packaging layout: Put schemas under the package tree, e.g. src/mypkg/schemas/*.json, and include them as package data via setuptools ([tool.setuptools.package-data] or package_data / include_package_data).^1^3
  • Library access: Provide helpers that return a file-like or text/bytes contents via importlib.resources, e.g. get_schema("name") -> str that calls importlib.resources.files("mypkg.schemas").joinpath("name.json").read_text(). This works whether installed as a directory or zipped wheel.^4
  • End‑user access:
    • Document the on-disk location as “importlib.resources.files('mypkg.schemas')” not as a hardcoded path, so users can call the same helper or use files() directly.
    • Optionally add a CLI (python -m mypkg dump-schema name) that prints or writes the schema file for users who are not writing Python code.

Java (JAR)

  • Packaging layout: Put schemas in src/main/resources/com/example/mylib/schemas/*.xsd so they end up in the JAR on the classpath.^5
  • Library access: Wrap getResourceAsStream("/com/example/mylib/schemas/foo.xsd") or similar behind a utility method like SchemaLoader.get("foo") that returns an InputStream or Source for validation APIs.^6
  • End‑user access:
    • Document the resource paths so non-Java users can extract them from the JAR (they’re normal files inside jar/zip).^7
    • If schemas should be referenced by URI (e.g. for XML validation), publish them at a stable HTTP URL and also keep identical copies inside the JAR; validators can resolve by URL, while Java code can still use classpath resources.^6

npm (Node / browser)

  • Packaging layout: Put schemas under a public directory (e.g. schemas/, dist/schemas/), and ensure package.json’s files/.npmignore do not exclude them.^8
  • Library access (Node):
    • Export either the schema contents or a factory from your main entry, e.g. export { mySchema } from "./schemas/my-schema.json"; or export function getSchema(name) { return require("./schemas/" + name + ".json"); }.^9
    • For tools that want file paths, expose a helper like getSchemaPath("name") using path.join(__dirname, "schemas", "name.json").
  • End‑user / tooling access:
    • Give each schema a stable $id/URL and host it on the web or via a CDN (GitHub Pages, npm-derived CDNs, or a custom domain), so editors and validators can fetch it directly (similar to JSON Schema Store).^10^12
    • Document both the URL (for external tools) and the in-package path/API (for programmatic use).

Rust crates

  • If schemas are static and versioned with the crate:
    • Put them under a directory like schemas/ next to src/ and embed via include_str! / include_bytes!, then expose them as functions like pub fn schema(name: &str) -> &'static str.^13
    • This ensures both library code and crate users always have the correct version, without worrying about runtime file locations.
  • End‑user access: document a supported way to get them: either “call mylib::schema("foo")” or “run mytool --dump-schema foo > foo.json”.^15

thewtex avatar Dec 15 '25 13:12 thewtex

The idea is to ship them in a stable, documented location inside the package, and also expose them through a canonical programmatic API.

I guess if developers want this, then the maintenance burden here might make sense? Beyond the added complexity, the risk to a user of one of these packages would be if a new version of the schemas is released, but one of these packages doesn't get updated.

For my own use, I would assume that the JSON schema documents are located at a URL (i.e., a stable, documented location), and I would expect that they are JSON (this provides the programmatic API), so encapsulating this in a python package wouldn't add much. But maybe there are other scenarios where the proposed bundling would be useful.

d-v-b avatar Dec 15 '25 13:12 d-v-b

The objective of this issue is to support deduplication of effort to provide offline access to the JSON schemas that provide standardization, even if it is limited, in a simple, consistent way that is supported across languages.

thewtex avatar Dec 15 '25 14:12 thewtex