zarr-python
zarr-python copied to clipboard
Reading data that was written with deprecated `bytes` codec
Prior to 3.1, it was possible to write an array that looked like this
doc = {
'shape': [],
'data_type': 'bytes',
'chunk_grid': {'name': 'regular', 'configuration': {'chunk_shape': []}},
'chunk_key_encoding': {'name': 'default',
'configuration': {'separator': '/'}},
'fill_value': [],
'codecs': [{'name': 'vlen-bytes', 'configuration': {}},
{'name': 'zstd', 'configuration': {'level': 0, 'checksum': False}}],
'attributes': {},
'zarr_format': 3,
'node_type': 'array',
'storage_transformers': []
}
Attempting to load this data errors
import zarr
zarr.core.metadata.ArrayV3Metadata.from_dict(doc)
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/dtype/registry.py:208, in DataTypeRegistry.match_json(self, data, zarr_format)
206 except DataTypeValidationError:
207 pass
--> [208](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/dtype/registry.py:208) raise ValueError(f"No Zarr data type found that matches {data!r}")
ValueError: No Zarr data type found that matches 'bytes'
The following tweaks make it loadable
doc["data_type"] = "variable_length_bytes"
doc["fill_value"] = ""
It would be nice if we
- Had an alias for the deprecated
bytesdtype tovariable_length_bytes - Could deal with
fill_value = []here
Otherwise data that was written with older Zarr versions is not interoperable with new ones.
we should definitely add an alias for bytes. The runtime fix would be to change this function
in fact, we should probably change the JSON signature of the vlen bytes data type to {"name": "bytes"} because this does have a spec, which lets us get rid of an annoying warning.