zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

N5ChunkWrapper bug with uint8

Open mzouink opened this issue 6 months ago • 4 comments

Zarr version

v2.18.4

Numcodecs version

0.51.1

Python Version

3.11.11

Operating System

Linux

Installation

conda-forge

Description

encode/decode doesn't work well with uint8, i tested it with multiple type. All works except uint8

Steps to reproduce

import numcodecs
from zarr.n5 import N5ChunkWrapper
import numpy as np

shape = (100, 100, 100)

types = [
    np.float32,
    np.uint8,
    np.int64,
    np.float16
    ]

for t in types:
    try:
        data = np.ones(shape, dtype=t)

        n5_encoder = N5ChunkWrapper(data.dtype, shape, compressor=numcodecs.Zstd())

        encoded_data = n5_encoder.encode(data)
        decoded_data = n5_encoder.decode(encoded_data)
        decoded_data = decoded_data.reshape(shape)

        if not np.array_equal(data, decoded_data):
            print(f"Decoded data does not match original data - {t.__name__}")
    except Exception as e:
        print(f"Error processing data of type {t.__name__}: {e}")

mzouink avatar Jun 25 '25 20:06 mzouink

@d-v-b @bogovicj any idea why this is happening ?

mzouink avatar Jun 25 '25 20:06 mzouink

here's a clear example:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr==2.18",
#   "numcodecs<0.16"
# ]
# ///

import numcodecs
from zarr.n5 import N5ChunkWrapper
import numpy as np
shape = (100, 100, 100)

types = [
    np.dtype('>u2'), # unsigned 16 bit big endian
    np.dtype('<u2'), # unsigned 16 bit little endian
    ]

for t in types:
    data = np.ones(shape, dtype=t)

    n5_encoder = N5ChunkWrapper(data.dtype, shape, compressor=numcodecs.Zstd())

    encoded_data = n5_encoder.encode(data)
    decoded_data = n5_encoder.decode(encoded_data)
    print(t, t.byteorder, type(decoded_data))
"""
>u2 > <class 'bytes'>
uint16 = <class 'numpy.ndarray'>
"""

decode calls this method, which, if the input is big-endian, hits this conditional and returns early before converting the bytes to a numpy array.

d-v-b avatar Jun 25 '25 20:06 d-v-b

Thanks @d-v-b adding

np.frombuffer(data, data.dtype.newbyteorder(">"))

solved the problem
but why the condition: if not self._little_endian

mzouink avatar Jun 26 '25 15:06 mzouink

I would ask the author of the code, he works in the same building as you :)

d-v-b avatar Jun 26 '25 15:06 d-v-b