zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Fix iterating over sharding index

Open brokkoli71 opened this issue 1 year ago • 4 comments

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape. Therefore reading the sharding index will drop some chunks. I was wondering if using morton code is a significant efficiency improvement over normal indexing like np.unravel_index

brokkoli71 avatar Oct 16 '24 20:10 brokkoli71

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape.

Is this a bug, or something fundamental to morton coding?

d-v-b avatar Oct 16 '24 20:10 d-v-b

@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

brokkoli71 avatar Oct 16 '24 20:10 brokkoli71

@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

I thought the use of the morton ordering was motivated by the desire to place spatially contiguous chunks close together in a shard. The simpler linearization algorithms will not allow this. Maybe it's better to fix the bug in the morton encoding?

d-v-b avatar Oct 16 '24 20:10 d-v-b

@d-v-b The morton encoding was actually right. The problem was that morton ordering by is designed to fill arrays with power-of-two shapes. Therefore, it would sometimes return a coordinate outside the array. I implemented that these coordinates are skipped so that most of the time the morton ordering is maintained and only at the edge a few jumps can occur.

brokkoli71 avatar Oct 22 '24 10:10 brokkoli71