zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Numcodecs Delta filter throws AttributeError when astype is specified

Open K-Meech opened this issue 5 months ago • 7 comments

Zarr version

v3.1.0

Numcodecs version

v0.16.1

Python Version

3.13

Operating System

Windows

Installation

pip into a virtual environment

Description

Running the code snippet in the 'steps to reproduce' section below, throws the following error:

  File "C:\Users\kimme\miniforge3\envs\zarr-stock-only-2\Lib\site-packages\zarr\core\common.py", line 89, in run
    return await func(*item)
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\kimme\miniforge3\envs\zarr-stock-only-2\Lib\site-packages\zarr\abc\codec.py", line 447, in wrap
    return await func(chunk, chunk_spec)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\kimme\miniforge3\envs\zarr-stock-only-2\Lib\site-packages\zarr\codecs\bytes.py", line 80, in _decode_single
    dtype = chunk_spec.dtype.to_native_dtype()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'numpy.dtypes.Int32DType' object has no attribute 'to_native_dtype'

In previous zarr-python versions (e.g. 3.0.10), this passes without error. Removing the astype option to Delta also results in the code passing without error.

Steps to reproduce

Code snippet based on example from zarr-python docs with different dtype / astype options to the Delta filter. This throws an error:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

from numcodecs.zarr3 import Delta
import numpy as np
import zarr

filters = [Delta(dtype='<i4', astype='<i4')]
compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=1, shuffle=zarr.codecs.BloscShuffle.shuffle)
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(
    store='data/delta-filter-v3.zarr', 
    shape=data.shape, 
    dtype=data.dtype, 
    chunks=(1000, 1000), 
    filters=filters, 
    compressors=compressors, 
    zarr_format=3
)
z[:] = 1

zarr_read = zarr.open("data/delta-filter-v3.zarr", zarr_format=3, mode="r+")
print(zarr_read[:])

The equivalent code for a v2 array passes without error:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import numcodecs
import numpy as np
import zarr

filters = [numcodecs.Delta(dtype='<i4', astype='<i4')]
compressors = numcodecs.Blosc(cname="zstd", clevel=1, shuffle=1)
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(
    store='data/delta-filter-v2.zarr', 
    shape=data.shape, 
    dtype=data.dtype, 
    chunks=(1000, 1000), 
    filters=filters, 
    compressors=compressors, 
    zarr_format=2
)
z[:] = 1

zarr_read = zarr.open("data/delta-filter-v2.zarr", zarr_format=2, mode="r+")
print(zarr_read[:])

Additional output

No response

K-Meech avatar Jul 16 '25 10:07 K-Meech

thanks for this report! I suspect this won't be too much work to fix; I'll try to get something out soon

d-v-b avatar Jul 16 '25 12:07 d-v-b

I suspect this won't be too much work to fix

how wrong I was... this issue exposed two problems:

  • the numcodecs.delta codec uses NumPy data types, but we recently refactored zarr python to use a different data type abstraction, which is the cause of the acute error you are seeing. A fix would be to write a fixed version of the delta codec, and register that codec with zarr python. however...
  • Our codec registry is broken, and it's not possible to register a fixed version of a numcodecs codec. See https://github.com/zarr-developers/zarr-python/issues/3261

d-v-b avatar Jul 17 '25 08:07 d-v-b

Thanks for looking into this @d-v-b !

K-Meech avatar Jul 17 '25 09:07 K-Meech

I am also seeing the same error but when specifying astype with FixedScaleOffset filter:

filter = numcodecs.zarr3.FixedScaleOffset(offset=0, scale=100, dtype='float32', astype='int16')
encoding = {'var': { 'filters': [filter] }}
ds.to_zarr('/tmp/test.zarr',  encoding=encoding, mode='w')
_ds = xr.open_dataset('/tmp/test.zarr')
_ds.var.max()

AttributeError: 'numpy.dtypes.Int16DType' object has no attribute 'to_native_dtype'

cas-- avatar Jul 17 '25 18:07 cas--

yes, effectively every dtype-sensitive codec in numcodecs.zarr3 is broken because of the recent dtypes change. we have to resolve #3261 and / or push a fix to numcodecs before this can be sorted out.

d-v-b avatar Jul 17 '25 18:07 d-v-b

i have a fix in https://github.com/zarr-developers/numcodecs/pull/766

d-v-b avatar Jul 18 '25 07:07 d-v-b

This seems to be fixed for me with latest release of zarr 3.1.3, great work for resolving this @d-v-b

cas-- avatar Sep 30 '25 15:09 cas--