imagecodecs icon indicating copy to clipboard operation
imagecodecs copied to clipboard

Error when slicing in zarr array

Open fcouziniedevy opened this issue 3 months ago • 3 comments

I am using this package to compress images in zarr array and if I try to access multiple images at the same time by slicing the zarr array, the decoders crash (I tried with Png and Jpeg compressors). I am not totally sure of where I should raise this issue but I think it is more logical to post it in this repository than in zarr or numcodecs.

As a quick summary, when using image compressors in zarr array, integer indexing works fine: imgs[0, :, :, :] but not slicing: imgs[:2, :, :, :] or imgs[:, :, :, :]. The zarr documentation suggests that slicing should work. This is not a blocking issue since the images can be accessed by using integer indexing (or a list of integers by using oindex) but it might still be worth it to fix it (in particular for the simple [:] slicing). I am guessing that in the case of a slice the decoder receive a single buffer containing multiple images and try to decode it as if it was a single image but I am not sure how to fix it.

Here is a more complete script to replicate the problem (with zarr 2.16.1, numcodecs 0.12.1 and imagecodecs 2024.1.1). The Png compressor raise a ValueError but the Jpeg one causes a segfault.

import traceback

from imagecodecs.numcodecs import Jpeg, Png
from numcodecs import Blosc
import numpy as np
import zarr


zarr.register_codec(Jpeg)
zarr.register_codec(Png)
N_IMAGES = 5
IMG_SIZE = 32, 32, 3

def create_images_array(compressor):
    imgs = np.zeros((N_IMAGES, *IMG_SIZE), dtype=np.uint8)
    arr = zarr.array(data=imgs,
                     dtype="uint8",
                     chunks=(1, *IMG_SIZE),
                     compressor=compressor)
    return arr


def test_zarr_array_slicing(arr):
    print(f"### Array of shape {arr.shape} with compressor ###"
          f"{arr.compressor.__class__.__name__}")
    # Valid indexing for every compressor
    print("  integer indexing:", end=" ")
    imgs[0, ...]
    imgs[1, ...]
    imgs.get_orthogonal_selection(([0, 1], slice(-1), slice(-1), slice(-1)))
    print("ok")

    # Causes crashes with imagecodecs's numcodes compressors
    print("  slicing: ", end="")
    imgs[:]
    imgs[:1, ...]
    np.asarray(list(imgs.islice(0, 2)))
    print("ok")


if __name__ == "__main__":
    for compressor in [Blosc(), Png(), Jpeg()]:
        try:
            imgs = create_images_array(compressor)
            test_zarr_array_slicing(imgs)
        except ValueError as e:
            print(f"Error")
            print(traceback.print_exception(e))
        print()

Thank you for this package, it is really useful and well coded.

fcouziniedevy avatar Mar 13 '24 10:03 fcouziniedevy

Thanks for reporting. I thought slicing is tested and output buffers are checked for shape and strides. I'll have to debug this in detail...

cgohlke avatar Mar 13 '24 15:03 cgohlke

After some tests, it seems what makes a difference between the code in the pytest test and mine is the tiling of the image in the test (in 128x128 crops). If I set the chunks to the exact image shape in the test, it also fails. No idea why though...

fcouziniedevy avatar Mar 13 '24 17:03 fcouziniedevy

I think I found the culprit. All codecs using strided decoding (PNG, WebP, JPEG) fail if the output buffer is not exactly the expected shape, but for example (1, *shape) returned by slicing. In that case out.strides[1] should be used instead of out.strides[0] as line stride.

cgohlke avatar Mar 13 '24 18:03 cgohlke