zarr-specs icon indicating copy to clipboard operation
zarr-specs copied to clipboard

Mechanism for indicating metadata properties only relevant for writing

Open jbms opened this issue 8 months ago • 2 comments

Prompted by https://github.com/zarr-developers/zarr-specs/pull/330#issuecomment-2766522374 I created this issue to discuss the issue of metadata properties that are only relevant for writing, not reading.

For example, all of the current configuration options for blsoc, gzip, and zstd are only needed when writing, and can be ignored when reading.

The must_understand property is one mechanism for indicating required vs optional parameters (I think technically it is up to each extension to indicate that the convention applies to their configuration).

But it could be extended to something like must_understand_for_reading and must_understand_for_writing, or specified in some other way, e.g. a encode_options property which is assumed to only apply to encoding and not decoding, or a naming convention for properties to indicate if they can be ignored for reading/writing, e.g. prefix encode-only options with encode_ and decode-only options (not clear if that makes sense?) with decode_.

Similarly, consolidated metadata (assuming it duplicates, rather than replaces, the regular metadata) could safely be ignored when reading, but must not be ignored when writing, as otherwise the consolidated metadata would become out-of-sync.

encode_ and decode_ as prefixes work well for codecs but not ideal for array metadata; ideally we can use a consistent syntax.

jbms avatar Mar 31 '25 19:03 jbms

See also https://github.com/zarr-developers/zarr-specs/issues/277 and https://github.com/zarr-developers/zarr-specs/issues/270

jbms avatar Mar 31 '25 19:03 jbms

A few random ideas in this direction:

  • in the specific case of codecs, it might be useful to represent a single codec as a pair of encode and decode operations, which might take different parameters. This could either manifest within the declaration of the codec object itself, e.g. {"name": "blosc", "configuration": {"encode": {...}}, where the lack of a decode configuration key denotes that reading requires no configuration, or in the structure of the codecs field, which could specify separate encode and decode sequences: "codecs": {"encode": [....], "decode": [...]} If a codec does not appear in the decode sequence, it is not required for reading.

  • each metadata object could have an optional required_for key, which lists the names of operations that the metadata field is required for. e.g.,

{
  "name": "foo", 
  "required_for": ["chunk_read", "chunk_write", "attribute_read", "attribute_write"],
  "configuration": {...}
}

I think reading / writing chunks and attributes covers the array API, the group API might be more extensive, e.g. "create_node", "update_node", "delete_node", etc

d-v-b avatar Jul 08 '25 21:07 d-v-b