zarr-python Fix iterating over sharding index

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape. Therefore reading the sharding index will drop some chunks. I was wondering if using morton code is a significant efficiency improvement over normal indexing like np.unravel_index

Oct 16 '24 20:10 brokkoli71

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape.

Is this a bug, or something fundamental to morton coding?

Oct 16 '24 20:10 d-v-b

@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

Oct 16 '24 20:10 brokkoli71

@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

I thought the use of the morton ordering was motivated by the desire to place spatially contiguous chunks close together in a shard. The simpler linearization algorithms will not allow this. Maybe it's better to fix the bug in the morton encoding?

Oct 16 '24 20:10 d-v-b

@d-v-b The morton encoding was actually right. The problem was that morton ordering by is designed to fill arrays with power-of-two shapes. Therefore, it would sometimes return a coordinate outside the array. I implemented that these coordinates are skipped so that most of the time the morton ordering is maintained and only at the edge a few jumps can occur.

Oct 22 '24 10:10 brokkoli71