zarr-specs
zarr-specs copied to clipboard
Support zero-padding chunk indices when generating chunk keys
In the current v3 core protocol draft, chunk keys are formed by concatenating chunk indices without any zero padding, e.g., "0.0" and "100.200", etc. However, this means chunk files/objects do not sort lexically, which can be convenient when accessing zarr data via generic tools. To get a lexical sort could be achieved with zero padding, e.g., "0.0" becomes "000.000". It is hard to generalise because fixing a number of zeros to pad would constrain the number of chunks on any dimension, and it is impossible in general to know ahead of time how many chunks are needed given that array dimensions can be resized. However, it might be possible to add this as an option, expecting that it is not the default but may in some circumstances be specified by the user.
This seems like a good idea. One question that comes up though is how appending would be handled.
Generating keys with lexicographic order matching the sort order is indeed a good idea. You can zero-pad up to the maximum possible length (e.g. assuming 64-bit index but that is rather long. Alternatively there are variable-length encodings (prefixing the length somehow) but they sacrifice readability and simplicity.
Zero-padding up to a user-specified length seems like a good extension. I'm not sure if this needs to be part of the core though, I think it could be added as a storage-transformer extension later.