Numcodecs Delta filter throws AttributeError when astype is specified
Zarr version
v3.1.0
Numcodecs version
v0.16.1
Python Version
3.13
Operating System
Windows
Installation
pip into a virtual environment
Description
Running the code snippet in the 'steps to reproduce' section below, throws the following error:
File "C:\Users\kimme\miniforge3\envs\zarr-stock-only-2\Lib\site-packages\zarr\core\common.py", line 89, in run
return await func(*item)
^^^^^^^^^^^^^^^^^
File "C:\Users\kimme\miniforge3\envs\zarr-stock-only-2\Lib\site-packages\zarr\abc\codec.py", line 447, in wrap
return await func(chunk, chunk_spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\kimme\miniforge3\envs\zarr-stock-only-2\Lib\site-packages\zarr\codecs\bytes.py", line 80, in _decode_single
dtype = chunk_spec.dtype.to_native_dtype()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'numpy.dtypes.Int32DType' object has no attribute 'to_native_dtype'
In previous zarr-python versions (e.g. 3.0.10), this passes without error. Removing the astype option to Delta also results in the code passing without error.
Steps to reproduce
Code snippet based on example from zarr-python docs with different dtype / astype options to the Delta filter. This throws an error:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
from numcodecs.zarr3 import Delta
import numpy as np
import zarr
filters = [Delta(dtype='<i4', astype='<i4')]
compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=1, shuffle=zarr.codecs.BloscShuffle.shuffle)
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(
store='data/delta-filter-v3.zarr',
shape=data.shape,
dtype=data.dtype,
chunks=(1000, 1000),
filters=filters,
compressors=compressors,
zarr_format=3
)
z[:] = 1
zarr_read = zarr.open("data/delta-filter-v3.zarr", zarr_format=3, mode="r+")
print(zarr_read[:])
The equivalent code for a v2 array passes without error:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import numcodecs
import numpy as np
import zarr
filters = [numcodecs.Delta(dtype='<i4', astype='<i4')]
compressors = numcodecs.Blosc(cname="zstd", clevel=1, shuffle=1)
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(
store='data/delta-filter-v2.zarr',
shape=data.shape,
dtype=data.dtype,
chunks=(1000, 1000),
filters=filters,
compressors=compressors,
zarr_format=2
)
z[:] = 1
zarr_read = zarr.open("data/delta-filter-v2.zarr", zarr_format=2, mode="r+")
print(zarr_read[:])
Additional output
No response
thanks for this report! I suspect this won't be too much work to fix; I'll try to get something out soon
I suspect this won't be too much work to fix
how wrong I was... this issue exposed two problems:
- the
numcodecs.deltacodec uses NumPy data types, but we recently refactored zarr python to use a different data type abstraction, which is the cause of the acute error you are seeing. A fix would be to write a fixed version of the delta codec, and register that codec with zarr python. however... - Our codec registry is broken, and it's not possible to register a fixed version of a numcodecs codec. See https://github.com/zarr-developers/zarr-python/issues/3261
Thanks for looking into this @d-v-b !
I am also seeing the same error but when specifying astype with FixedScaleOffset filter:
filter = numcodecs.zarr3.FixedScaleOffset(offset=0, scale=100, dtype='float32', astype='int16')
encoding = {'var': { 'filters': [filter] }}
ds.to_zarr('/tmp/test.zarr', encoding=encoding, mode='w')
_ds = xr.open_dataset('/tmp/test.zarr')
_ds.var.max()
AttributeError: 'numpy.dtypes.Int16DType' object has no attribute 'to_native_dtype'
yes, effectively every dtype-sensitive codec in numcodecs.zarr3 is broken because of the recent dtypes change. we have to resolve #3261 and / or push a fix to numcodecs before this can be sorted out.
i have a fix in https://github.com/zarr-developers/numcodecs/pull/766
This seems to be fixed for me with latest release of zarr 3.1.3, great work for resolving this @d-v-b