zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

disallow 0-length fixed-size data types

Open d-v-b opened this issue 6 months ago • 0 comments

NumPy lets you define 0-length data types, but they are useless in the context of arrays:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "numpy",
# ]
# ///

import numpy as np

dt = np.dtype('str')
print(dt.itemsize)
# 0
arr = np.array([], dtype=dt)
print(arr.dtype)
# dtype('<U1')
print(arr.dtype == dt)
# False

NumPy's data type API happily treats the user's requested data type as a suggestion rather than an instruction. Oh well.

In Zarr-python 2, these 0-length data types exposed some weird behavior, where the data type of an array did not match the data type of the numpy arrays returned when indexing that array:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr==2.18",
#   "numcodecs<0.16"
# ]
# ///

import zarr

array = zarr.create(store={}, shape=(1,), dtype='U0')
array[:] = ''
print(array.dtype)
# <U0
print(array[:].dtype)
# <U1
print(array.dtype == array[:].dtype)
# False

A few options for zarr python, ranked by goodness, descending:

  • copy NumPy's bad behavior. If a user requests a 0-length data type, silently replace it with a 1-length version
  • allow creating 0-length data types, but raise an exception when creating an array with one.
  • disallow the creation of 0-length data types at the data type level.

I think option 3 makes the most sense, so I'm going to open a PR with that. But we can talk about alternatives here as well. Maybe there is a big market for these 0-length data types that I am not aware of.

addresses part of #3167

d-v-b avatar Jun 25 '25 08:06 d-v-b