zarr-specs icon indicating copy to clipboard operation
zarr-specs copied to clipboard

`must_understand=False` for codecs

Open d-v-b opened this issue 5 months ago • 8 comments

as of https://github.com/zarr-developers/zarr-specs/pull/330 codecs can be declared with must_understand=False. What's the use case for this?

cc @LDeakin

d-v-b avatar Jul 03 '25 16:07 d-v-b

Codecs that do pass-through decoding like bitround, quantize, etc., could declare this. Viewers can read such arrays without explicitly adding support for the codecs.

LDeakin avatar Jul 03 '25 21:07 LDeakin

I would argue that a writer should not be able to ignore those codecs (or any codec) when writing data. I think it's important that different implementations create the exact same chunks from the exact same input data, for a particular metadata document.

If we allow codecs to be ignored when writing, we can get arrays with identical metadata but different chunks, for the same input data, which seems really bad to me.

I think the problem you describe (codecs that can be ignored specifically when reading) would be better solved by a codec field that declares just that: whether the codec can be ignored for reading.

d-v-b avatar Jul 04 '25 06:07 d-v-b

@normanrz @joshmoore could you explain, or link to a discussion about, the decision to allow must_understand=False for codecs? I don't recall this from the discussion in #330 but perhaps I missed it.

d-v-b avatar Jul 04 '25 06:07 d-v-b

I agree that codecs should not be ignored when writing data. This is similar to what I was arguing for with extensions:

https://github.com/zarr-developers/zeps/pull/67#issuecomment-2907501279

must_understand: false in an extension implies to me that an implementation should support reading, but should not write unless it is actually aware of the extension and knows that it is okay to do so

https://github.com/zarr-developers/zeps/pull/67#issuecomment-2913481181

I agree with what @LDeakin said about must_understand --- must understand for writing should always implicitly be true and must_understand applies only for reading. That simplifies things nicely.

LDeakin avatar Jul 04 '25 06:07 LDeakin

@d-v-b: I also don't remember an explicit examination of the impact of this, but more that there was a general drive to unify the extensions mechanism. Of course it's not adopted yet, but I do think there's consensus on @LDeakin's explanation of must_understand from ZEP10 and think it's a good way to clarify this oversight from ZEP9.

joshmoore avatar Jul 04 '25 06:07 joshmoore

Even if we add language like "must_understand: false means that an implementation may support reading, but not writing", this will be ambiguous until we define what "reading" and "writing" mean.

For arrays, I think reading / writing can mean two distinct things:

  • reading / writing array attributes
  • reading / writing chunks

For codecs, chunk key encodings, chunk grids, etc I think reading and writing are narrowly scoped to chunks. That means to me that if an implementation reads array metadata and encounters an unknown codec with must_understand: true, the implementation could still read or write the attributes field of array metadata.

If we agree on this, then we should add this clarifying language to the spec. I think this should be done separately from https://github.com/zarr-developers/zeps/pull/67.

d-v-b avatar Jul 04 '25 07:07 d-v-b

food for thought: using separate read / write codec chains would remove the need for must_understand entirely:

"codecs": {
    "read": {
        "array-array": [],
        "array-bytes": "bytes", 
        "bytes-bytes": ["gzip"] 
     },
    "write": {
        "array-array": [{"name": "bitround", "configuration": {...}}],
        "array-bytes": "bytes",
        "bytes-bytes": [{"name": "gzip", "configuration": {"level": 3}}]
    }
}

d-v-b avatar Jul 04 '25 08:07 d-v-b

This is nice in that it cleanly separates encode and decode parameters, but on the other hand it makes it more verbose in the common case where all codecs apply toboth for reading and writing.

jbms avatar Jul 29 '25 21:07 jbms