zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

fixed length subtype recarray + "auto" shards crashes

Open ilan-gold opened this issue 2 months ago • 5 comments

Zarr version

3.1.4.dev29+gfc8e8ad1a

Numcodecs version

0.16.3

Python Version

3.12.3

Operating System

macOS-15.1-arm64-arm-64bit

Installation

uv pip

Description

The reproducer below fails with shards="auto" but works otherwise.

Here is the traceback:

/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py:4674: ZarrUserWarning: Automatic shard shape inference is experimental and may change without notice.
  shard_shape_parsed, chunk_shape_parsed = _auto_partition(
/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/dtype/npy/structured.py:318: UnstableSpecificationWarning: The data type (Structured(fields=(('PyvCr', FixedLengthUTF32(length=4, endianness='little')), ('UWJNo', FixedLengthUTF32(length=4, endianness='little'))))) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/dtype/npy/string.py:249: UnstableSpecificationWarning: The data type (FixedLengthUTF32(length=4, endianness='little')) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
Traceback (most recent call last):
  File "/Users/ilangold/Projects/Theis/anndata/new_tester.py", line 61, in <module>
    f[...] = arr
    ~^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py", line 2966, in __setitem__
    self.set_basic_selection(cast("BasicSelection", pure_selection), value, fields=fields)
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py", line 3200, in set_basic_selection
    sync(self._async_array._set_selection(indexer, value, fields=fields, prototype=prototype))
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/sync.py", line 159, in sync
    raise return_result
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/sync.py", line 119, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/array.py", line 1735, in _set_selection
    await self.codec_pipeline.write(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 486, in write
    await concurrent_map(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 100, in concurrent_map
    return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 98, in run
    return await func(*item)
           ^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 352, in write_batch
    await self.encode_partial_batch(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 247, in encode_partial_batch
    await self.array_bytes_codec.encode_partial(batch_info)
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/abc/codec.py", line 265, in encode_partial
    await concurrent_map(
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 100, in concurrent_map
    return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/core/common.py", line 98, in run
    return await func(*item)
           ^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/Library/Caches/uv/environments-v2/new-tester-8e6728b281f68c98/lib/python3.12/site-packages/zarr/codecs/sharding.py", line 603, in _encode_partial_single
    chunks_per_shard = self._get_chunks_per_shard(shard_spec)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 3, in __hash__
TypeError: unhashable type: 'writeable void-scalar'

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
#   "numpy",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
from __future__ import annotations

import numpy as np
import zarr

# your reproducer code
zarr.print_debug_info()

arr = np.rec.array([('sQF', 'SQC'), ('XVut', 'XNsc'), ('HBz', 'xRL'),
           ('fuf', 'pyld'), ('Osuh', 'tRF'), ('PIpC', 'zzN'),
           ('YDyZ', 'MlJ'), ('RnG', 'PdF'), ('AHQ', 'uSc'),
           ('sRh', 'spmy')],
          dtype=[('btHIM', '<U4'), ('HLuXc', '<U4')])

g = zarr.open("foo.zarr", mode="w")
f = g.create_array("rec", shape=arr.shape, dtype=arr.dtype, shards="auto")
f[...] = arr

Additional output

No response

ilan-gold avatar Oct 24 '25 15:10 ilan-gold

my first guess, based on the traceback, is that a numpy dtype is getting used in place of a zarr data type.

d-v-b avatar Oct 24 '25 15:10 d-v-b

Is it necessary to use a zarr data type here? From https://zarr.readthedocs.io/en/stable/user-guide/data_types.html#data-types-in-zarr-python

In this context, a “native” data type is a Python class, typically defined in another library, that models an array’s data type. For example, np.dtypes.UInt8DType is a native data type defined in NumPy. Zarr Python wraps the NumPy uint8 with a ZDType instance called [UInt8](https://zarr.readthedocs.io/en/stable/api/zarr/dtype/index.html#zarr.dtype.ZDType).

As of this writing, the only native data types Zarr Python supports are NumPy data types.

would make me think it would be supported natively as a python class, as passed in here. And this works without sharding,, it's just shards="auto".

I also tried dtype=[("btHIM", "<U4"), ("HLuXc", "<U4")] as in this section and dtye=zarr.dtype.parse_dtype(arr.dtype, zarr_format=3) to no avail

ilan-gold avatar Oct 24 '25 16:10 ilan-gold

this looks relevant:

https://github.com/zarr-developers/zarr-python/blob/fe42655ae265e045f850e12f30726aa8668d6dde/src/zarr/codecs/sharding.py#L362-L367

d-v-b avatar Oct 24 '25 16:10 d-v-b

Nice ok! I guess this issue is a bit of a no-op then for now it looks like. Feel free to close if it's duplicating the other then.

ilan-gold avatar Oct 24 '25 16:10 ilan-gold

i do think it's worth having this issue as indicating either a new cached attribute we need to comment out 😒 or even more reason to fix the underlying issue

d-v-b avatar Oct 24 '25 16:10 d-v-b