zarr-specs
zarr-specs copied to clipboard
Mechanism for indicating metadata properties only relevant for writing
Prompted by https://github.com/zarr-developers/zarr-specs/pull/330#issuecomment-2766522374 I created this issue to discuss the issue of metadata properties that are only relevant for writing, not reading.
For example, all of the current configuration options for blsoc, gzip, and zstd are only needed when writing, and can be ignored when reading.
The must_understand property is one mechanism for indicating required vs optional parameters (I think technically it is up to each extension to indicate that the convention applies to their configuration).
But it could be extended to something like must_understand_for_reading and must_understand_for_writing, or specified in some other way, e.g. a encode_options property which is assumed to only apply to encoding and not decoding, or a naming convention for properties to indicate if they can be ignored for reading/writing, e.g. prefix encode-only options with encode_ and decode-only options (not clear if that makes sense?) with decode_.
Similarly, consolidated metadata (assuming it duplicates, rather than replaces, the regular metadata) could safely be ignored when reading, but must not be ignored when writing, as otherwise the consolidated metadata would become out-of-sync.
encode_ and decode_ as prefixes work well for codecs but not ideal for array metadata; ideally we can use a consistent syntax.
See also https://github.com/zarr-developers/zarr-specs/issues/277 and https://github.com/zarr-developers/zarr-specs/issues/270
A few random ideas in this direction:
-
in the specific case of codecs, it might be useful to represent a single codec as a pair of
encodeanddecodeoperations, which might take different parameters. This could either manifest within the declaration of the codec object itself, e.g.{"name": "blosc", "configuration": {"encode": {...}}, where the lack of adecodeconfiguration key denotes that reading requires no configuration, or in the structure of thecodecsfield, which could specify separateencodeanddecodesequences:"codecs": {"encode": [....], "decode": [...]}If a codec does not appear in thedecodesequence, it is not required for reading. -
each metadata object could have an optional
required_forkey, which lists the names of operations that the metadata field is required for. e.g.,
{
"name": "foo",
"required_for": ["chunk_read", "chunk_write", "attribute_read", "attribute_write"],
"configuration": {...}
}
I think reading / writing chunks and attributes covers the array API, the group API might be more extensive, e.g. "create_node", "update_node", "delete_node", etc