[v3] support for ragged arrays
Zarr-Python 2 supported ragged arrays. This functionality has not made it into Zarr-Python 3 yet (see also #2617).
An example demonstrating this functionality using Zarr-Python 2:
z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
z
<zarr.core.Array (4,) object>
z.filters
[VLenArray(dtype='<i8')]
z[0] = np.array([1, 3, 5])
z[1] = np.array([4])
z[2] = np.array([7, 9, 14])
z[:]
array([array([1, 3, 5]), array([4]), array([ 7, 9, 14]),
array([], dtype=int64)], dtype=object
This issue tracks the development of ragged arrays support in Zarr-Python 3.
Just hopping into this dicusssion, but this does limit the ability for Hyperspy to support zarr 3.0.0. Our usecase is for ragged arrays which should be supported, and don't have the same security issues that directly json encoding a python object.
We could just unwrap the ragged arrays and store them alongside an second array with information of how to recreate the ragged array. Is that the best way to handle this or is there a better way to encode varible length objects.
I think ragged arrays are definitely in-scope for 3.x, we just haven't had time to implement it.
@d-v-b Thanks for the response! There is the VLenBytesCodec which seems like it could handle most of the encoding as long as the underlying array is 1 dimensional? The underlying source says that this might be changed in the future and is not explicitly supporting in v3. Is that still correct?
Any updates on this?
In https://github.com/rabernat/zarr-python/pull/1 we are developing an experimental prototype allowing any Arrow datatype to be stored in Zarr. This would enable ragged arrays using arrow list types.