`ShuffleFilter` fails to round trip
julia> using Zarr
julia> codec = Zarr.ShuffleFilter(elementsize=4)
Zarr.ShuffleFilter(0x0000000000000004)
julia> Zarr.zdecode(Zarr.zencode(UInt8[0x05], codec), codec)
1-element Vector{UInt8}:
0xe0
From what I can tell the shuffle filter is missing the "Add leftover to the end of data" step from https://github.com/HDFGroup/hdf5/blob/f2642985d8c23ff7e876c6228c7cc0cf20515923/src/H5Zshuffle.c#L279-L284
@mkitti am I reading that HDF5 code correctly, and do you know if appending leftover data at the end after the shuffle is a standard thing to do? I can't find a place where this is documented.
Shuffling under Zarr should error if the input array byte count is not a multiple of the element size.
https://github.com/zarr-developers/numcodecs/blob/main/numcodecs%2Fshuffle.py
HDF5 filter implementations should not be assumed to be compatible with their Zarr counterparts.
Additionally, Zarr v2 codecs and Zarr v3 codecs may have subtly distinct behavior and defaults.
Interesting, I think the shuffle filter was originally supposed to be compatible with HDF5. Ref: https://github.com/fsspec/kerchunk/issues/11 But they took the implementation from https://github.com/HDFGroup/hsds/blob/03890edfa735cc77da3bc06f6cf5de5bd40d1e23/hsds/util/storUtil.py#L43
I've tested in https://github.com/nhz2/ChunkCodecs.jl/pull/6 that HDF5 copies the remaining data at the end if the data length is not evenly divisible by the element size. For example "12312312312345" with element size 3 gets byte shuffled to"11112222333345".