zarr-python
zarr-python copied to clipboard
N5ChunkWrapper bug with uint8
Zarr version
v2.18.4
Numcodecs version
0.51.1
Python Version
3.11.11
Operating System
Linux
Installation
conda-forge
Description
encode/decode doesn't work well with uint8, i tested it with multiple type. All works except uint8
Steps to reproduce
import numcodecs
from zarr.n5 import N5ChunkWrapper
import numpy as np
shape = (100, 100, 100)
types = [
np.float32,
np.uint8,
np.int64,
np.float16
]
for t in types:
try:
data = np.ones(shape, dtype=t)
n5_encoder = N5ChunkWrapper(data.dtype, shape, compressor=numcodecs.Zstd())
encoded_data = n5_encoder.encode(data)
decoded_data = n5_encoder.decode(encoded_data)
decoded_data = decoded_data.reshape(shape)
if not np.array_equal(data, decoded_data):
print(f"Decoded data does not match original data - {t.__name__}")
except Exception as e:
print(f"Error processing data of type {t.__name__}: {e}")
@d-v-b @bogovicj any idea why this is happening ?
here's a clear example:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr==2.18",
# "numcodecs<0.16"
# ]
# ///
import numcodecs
from zarr.n5 import N5ChunkWrapper
import numpy as np
shape = (100, 100, 100)
types = [
np.dtype('>u2'), # unsigned 16 bit big endian
np.dtype('<u2'), # unsigned 16 bit little endian
]
for t in types:
data = np.ones(shape, dtype=t)
n5_encoder = N5ChunkWrapper(data.dtype, shape, compressor=numcodecs.Zstd())
encoded_data = n5_encoder.encode(data)
decoded_data = n5_encoder.decode(encoded_data)
print(t, t.byteorder, type(decoded_data))
"""
>u2 > <class 'bytes'>
uint16 = <class 'numpy.ndarray'>
"""
decode calls this method, which, if the input is big-endian, hits this conditional and returns early before converting the bytes to a numpy array.
Thanks @d-v-b adding
np.frombuffer(data, data.dtype.newbyteorder(">"))
solved the problem
but why the condition: if not self._little_endian
I would ask the author of the code, he works in the same building as you :)