zarr-python
zarr-python copied to clipboard
disallow 0-length fixed-size data types
NumPy lets you define 0-length data types, but they are useless in the context of arrays:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "numpy",
# ]
# ///
import numpy as np
dt = np.dtype('str')
print(dt.itemsize)
# 0
arr = np.array([], dtype=dt)
print(arr.dtype)
# dtype('<U1')
print(arr.dtype == dt)
# False
NumPy's data type API happily treats the user's requested data type as a suggestion rather than an instruction. Oh well.
In Zarr-python 2, these 0-length data types exposed some weird behavior, where the data type of an array did not match the data type of the numpy arrays returned when indexing that array:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr==2.18",
# "numcodecs<0.16"
# ]
# ///
import zarr
array = zarr.create(store={}, shape=(1,), dtype='U0')
array[:] = ''
print(array.dtype)
# <U0
print(array[:].dtype)
# <U1
print(array.dtype == array[:].dtype)
# False
A few options for zarr python, ranked by goodness, descending:
- copy NumPy's bad behavior. If a user requests a 0-length data type, silently replace it with a 1-length version
- allow creating 0-length data types, but raise an exception when creating an array with one.
- disallow the creation of 0-length data types at the data type level.
I think option 3 makes the most sense, so I'm going to open a PR with that. But we can talk about alternatives here as well. Maybe there is a big market for these 0-length data types that I am not aware of.
addresses part of #3167