Fix iterating over sharding index
For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape. Therefore reading the sharding index will drop some chunks.
I was wondering if using morton code is a significant efficiency improvement over normal indexing like np.unravel_index
For some specific shapes of
chunks_per_shard(e.g.[5,2]), the current implementation of morton code will produce indices that are not within the shape.
Is this a bug, or something fundamental to morton coding?
@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.
I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.
@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.
I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like
np.unravel_index.
I thought the use of the morton ordering was motivated by the desire to place spatially contiguous chunks close together in a shard. The simpler linearization algorithms will not allow this. Maybe it's better to fix the bug in the morton encoding?
@d-v-b The morton encoding was actually right. The problem was that morton ordering by is designed to fill arrays with power-of-two shapes. Therefore, it would sometimes return a coordinate outside the array. I implemented that these coordinates are skipped so that most of the time the morton ordering is maintained and only at the edge a few jumps can occur.