zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

compatibility with the v3 bytes dtype

Open d-v-b opened this issue 2 months ago • 3 comments

there's a v3 data type definition for a variable-length bytes data type: https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/bytes which was not on my radar when I added variable-length bytes support in #2874.

The v3 bytes data type is incompatible with the VariableLengthBytes data type that I implemented in #2874. The differences are:

data type identifier fill value
v3 bytes dtype "bytes" array of ints (one per byte)
Zarr Python VariableLengthBytes dtype "variable_length_bytes" string (base64-encoded bytes)

As an ecosystem we should probably not have 2 nearly identical data types. That argues for consolidating these two. Since the VariableLengthBytes data type doesn't have a spec, I think its current behavior should be deprecated and we should either modify it to comply with the v3 bytes data type spec, or introduce a brand new data type class that complies with that spec.

Either way we can be compatible with older data by taking "vlen-bytes" as an alias for "bytes", and reading (but not writing) the base64-encoded fill value.

Any thoughts or preferences for these two options? Modifying the JSON form of the existing data type would break the ability for older versions of zarr-python to read the data type metadata, but we also loudly warned about this with warnings on the data type.

d-v-b avatar Oct 10 '25 20:10 d-v-b

cc @kylebarron , @LDeakin

d-v-b avatar Oct 10 '25 20:10 d-v-b

vlen-bytes is the codec for that data type, but the data type name is variable_length_bytes.

It wouldn't be unreasonable to me to extend the spec of bytes to allow base64 encoded string fill values as well.

LDeakin avatar Oct 11 '25 11:10 LDeakin

vlen-bytes is the codec for that data type, but the data type name is variable_length_bytes.

oops, good catch! I updated the table with this correction

d-v-b avatar Oct 11 '25 13:10 d-v-b