zarr-specs icon indicating copy to clipboard operation
zarr-specs copied to clipboard

Support zero-padding chunk indices when generating chunk keys

Open alimanfoo opened this issue 4 years ago • 3 comments

In the current v3 core protocol draft, chunk keys are formed by concatenating chunk indices without any zero padding, e.g., "0.0" and "100.200", etc. However, this means chunk files/objects do not sort lexically, which can be convenient when accessing zarr data via generic tools. To get a lexical sort could be achieved with zero padding, e.g., "0.0" becomes "000.000". It is hard to generalise because fixing a number of zeros to pad would constrain the number of chunks on any dimension, and it is impossible in general to know ahead of time how many chunks are needed given that array dimensions can be resized. However, it might be possible to add this as an option, expecting that it is not the default but may in some circumstances be specified by the user.

alimanfoo avatar May 01 '20 15:05 alimanfoo

This seems like a good idea. One question that comes up though is how appending would be handled.

jakirkham avatar May 01 '20 16:05 jakirkham

Generating keys with lexicographic order matching the sort order is indeed a good idea. You can zero-pad up to the maximum possible length (e.g. assuming 64-bit index but that is rather long. Alternatively there are variable-length encodings (prefixing the length somehow) but they sacrifice readability and simplicity.

jbms avatar Feb 22 '22 07:02 jbms

Zero-padding up to a user-specified length seems like a good extension. I'm not sure if this needs to be part of the core though, I think it could be added as a storage-transformer extension later.

jstriebel avatar Nov 16 '22 17:11 jstriebel