zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

v2.metadata and v3.metadata encode `fill_value` bytes differently

Open rabernat opened this issue 1 year ago • 0 comments

Here I am creating an array and specifying the fill_value as raw bytes b'X'

import zarr

fv = b'X'

a = zarr.create(shape=10, dtype=bytes, zarr_version=2, fill_value=fv)
ad = a.metadata.to_dict()
print(ad)
# -> {'shape': (10,), 'fill_value': 'WA==', 'attributes': {}, 'zarr_format': 2, 'order': 'C', 'filters': None, 'dimension_separator': '.', 'compressor': None, 'chunks': (10,), 'dtype': '|S0'}


b = zarr.create(shape=10, dtype=bytes, zarr_version=3, fill_value=fv)
bd = b.metadata.to_dict()
print(bd)
# -> {'shape': (10,), 'fill_value': (88,), 'chunk_grid': {'name': 'regular', 'configuration': {'chunk_shape': (10,)}}, 'attributes': {}, 'zarr_format': 3, 'data_type': <DataType.bytes: 'bytes'>, 'chunk_key_encoding': {'name': 'default', 'configuration': {'separator': '/'}}, 'codecs': ({'name': 'vlen-bytes', 'configuration': {}},), 'node_type': 'array', 'storage_transformers': ()}

assert zarr.core.metadata.v2.ArrayV2Metadata.from_dict(ad).fill_value == fv
assert zarr.core.metadata.v3.ArrayV3Metadata.from_dict(bd).fill_value == fv

As we can see, the way this fill value is encoded looks quite different from these two. Remarkably, it gets translated back to something reasonable in both cases.

In both cases, the bytes are going through this path: https://github.com/zarr-developers/zarr-python/blob/aa46b451ae6a83e1befc2525ec9629953949aa79/src/zarr/abc/metadata.py#L33-L34

This converts the bytes to a tuple of ints.

However, for v2, #2286 added this additional special handling for fill_value:

https://github.com/zarr-developers/zarr-python/blob/aa46b451ae6a83e1befc2525ec9629953949aa79/src/zarr/core/metadata/v2.py#L146-L150

According to the V3 spec:

Raw data types (r<N>) An array of integers, with length equal to <N>, where each integer is in the range [0, 255].

This seems in line with what is happening.

This is relevant to https://github.com/pydata/xarray/issues/5475

rabernat avatar Oct 09 '24 13:10 rabernat