Packages that vendor the JSON Schemas
PyPI (python), Maven (java), Crates.io (Rust), and NPM (javascript) packages that contain the JSON schemas. This will allow tools to reference them offline. Derived from the discussion in #58 .
How about using a git submodule? They're just a collection of static files, which can be a bit awkward to package in certain language repos, and trying to keep many software packages up to date across different ecosystems (= different maintainers, probably) is much harder than just relying on git.
Yes, the package repository/repositories themselves could use git submodules or git subtrees. And every tool maintainer could use git submodules or git subtrees, but this duplicates effort across more software packages and more maintainers.
these proposed software packages will need to get the JSON schema documents somehow. Let's call whatever method these packages use to get the JSON schema documents X. When is it better for an ome-zarr python package to outsource X to some third-party package, vs doing X itself?
I guess every OME-Zarr implementor is already using some language-specific dependency management tool and isn't necessarily using any git submodules. The question is whether the maintenance burden of managing a new git submodule integration is harder for the OME-Zarr implementors than it is to create and maintain a JSON schema package for every language. If we were to go this way, my strong recommendation would be to do it in a monorepo with all of the packages published by CI, so all of the languages' packages are in sync.
Another question is: how much do we want to encourage relying on JSON Schema-based validation? JSON Schema can't express plenty of the validation rules which the spec requires, which means that a properly validating implementation must do a load of its own validation. The validations JSON Schema can perform are pretty trivial to code (compared to the inexpressible validations); to the point where the complexity of adding JSON Schema to your own validation layer may be more complicated than just hand-writing the same rules. Because JSON Schema references are URIs, not URLs, resolving them is non-trivial.
The idea is to ship them in a stable, documented location inside the package, and also expose them through a canonical programmatic API.
This is (AI-generated):
Python
- Packaging layout: Put schemas under the package tree, e.g.
src/mypkg/schemas/*.json, and include them as package data viasetuptools([tool.setuptools.package-data]orpackage_data/include_package_data).^1^3 - Library access: Provide helpers that return a file-like or text/bytes contents via
importlib.resources, e.g.get_schema("name") -> strthat callsimportlib.resources.files("mypkg.schemas").joinpath("name.json").read_text(). This works whether installed as a directory or zipped wheel.^4 - End‑user access:
- Document the on-disk location as “
importlib.resources.files('mypkg.schemas')” not as a hardcoded path, so users can call the same helper or usefiles()directly. - Optionally add a CLI (
python -m mypkg dump-schema name) that prints or writes the schema file for users who are not writing Python code.
- Document the on-disk location as “
Java (JAR)
- Packaging layout: Put schemas in
src/main/resources/com/example/mylib/schemas/*.xsdso they end up in the JAR on the classpath.^5 - Library access: Wrap
getResourceAsStream("/com/example/mylib/schemas/foo.xsd")or similar behind a utility method likeSchemaLoader.get("foo")that returns anInputStreamorSourcefor validation APIs.^6 - End‑user access:
- Document the resource paths so non-Java users can extract them from the JAR (they’re normal files inside
jar/zip).^7 - If schemas should be referenced by URI (e.g. for XML validation), publish them at a stable HTTP URL and also keep identical copies inside the JAR; validators can resolve by URL, while Java code can still use classpath resources.^6
- Document the resource paths so non-Java users can extract them from the JAR (they’re normal files inside
npm (Node / browser)
- Packaging layout: Put schemas under a public directory (e.g.
schemas/,dist/schemas/), and ensurepackage.json’sfiles/.npmignoredo not exclude them.^8 - Library access (Node):
- Export either the schema contents or a factory from your main entry, e.g.
export { mySchema } from "./schemas/my-schema.json";orexport function getSchema(name) { return require("./schemas/" + name + ".json"); }.^9 - For tools that want file paths, expose a helper like
getSchemaPath("name")usingpath.join(__dirname, "schemas", "name.json").
- Export either the schema contents or a factory from your main entry, e.g.
- End‑user / tooling access:
- Give each schema a stable
$id/URL and host it on the web or via a CDN (GitHub Pages, npm-derived CDNs, or a custom domain), so editors and validators can fetch it directly (similar to JSON Schema Store).^10^12 - Document both the URL (for external tools) and the in-package path/API (for programmatic use).
- Give each schema a stable
Rust crates
- If schemas are static and versioned with the crate:
- Put them under a directory like
schemas/next tosrc/and embed viainclude_str!/include_bytes!, then expose them as functions likepub fn schema(name: &str) -> &'static str.^13 - This ensures both library code and crate users always have the correct version, without worrying about runtime file locations.
- Put them under a directory like
- End‑user access: document a supported way to get them: either “call
mylib::schema("foo")” or “runmytool --dump-schema foo > foo.json”.^15
The idea is to ship them in a stable, documented location inside the package, and also expose them through a canonical programmatic API.
I guess if developers want this, then the maintenance burden here might make sense? Beyond the added complexity, the risk to a user of one of these packages would be if a new version of the schemas is released, but one of these packages doesn't get updated.
For my own use, I would assume that the JSON schema documents are located at a URL (i.e., a stable, documented location), and I would expect that they are JSON (this provides the programmatic API), so encapsulating this in a python package wouldn't add much. But maybe there are other scenarios where the proposed bundling would be useful.
The objective of this issue is to support deduplication of effort to provide offline access to the JSON schemas that provide standardization, even if it is limited, in a simple, consistent way that is supported across languages.