zarr-python
zarr-python copied to clipboard
indexing with list not supported
Minimal, reproducible code sample, a copy-pastable example if possible
import numpy as np
import zarr
rows = [3, 4]
a = np.arange(20).reshape((10, 2))
print(a[rows]) # OK
z = zarr.array(a)
print(z[rows]) # indexerror
Problem description
Based on this I expected this to work and to return the same rows as numpy. But it raises in index error.
Version and installation information
Please provide the following:
- Value of
zarr.__version__: 2.11.3
Hi @brentp. Which cells are you expecting to receive? e.g.
In [1]: import zarr, numpy as np
In [2]: z = zarr.array(np.arange(20).reshape(10,2))
In [3]: z[:]
Out[3]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]])
In [4]: z[[0, 1], [1, 0]]
Out[4]: array([1, 2])
Hi, I expect to get the same result as numpy, but index it's an indexerror. I have updated the example to be more complete with this code:
import numpy as np
import zarr
rows = [3, 4]
a = np.arange(20).reshape((10, 2))
print(a[rows])
z = zarr.array(a)
print(z[rows])
Ah, I see. I remember there being limitations but I don't remember this one. @jni?
Zarr also supports oindex and vindex, which may be another way to get at the functionality needed here
z.oindex[rows] certainly returns the same values, but vindex is being used by z[rows].
There's some overlap between these which can be a kind of confusing aspect of advanced indexing. Hence why vindex & oindex were added before advanced indexing was
Yes, basically as I remember it @shoyer raised the point that mixing integer indexing with slices (which this use case implicitly uses) can be tricky to get right, so we left that out of the fancy indexing implementation in #725 for a braver soul to tackle. :stuck_out_tongue_winking_eye: The discussion starts here:
https://github.com/zarr-developers/zarr-python/pull/725#issuecomment-832192992
As I see it, the next "easy" step is to allow leading integer indexing, which won't reorder axes in either NumPy or zarr.Array.vindex (I think!). What do you think about this @joshmoore @jakirkham @shoyer?
Sorry, I failed to note that no one else has responded. I'm all for incremental steps, but maybe there's a question of how to document the rest of the steps needed for the likes of @brentp.
Having apparently killed the conversation, I'm inclined to say, "Go for it, @jni! 👏🏽"
I'm currently on leave, but a PR would be pretty easy, hopefully @brentp or someone else can handle it. This is the bit of code that needs modifying:
https://github.com/zarr-developers/zarr-python/blob/9ce2f328e9ff687c151cd73634f4d624590df817/zarr/core.py#L784-L789
Before the else:, add an elif clause:
elif is_just_a_list_of_integers(pure_selection):
return self.oindex[pure_selection]
For an appropriate implementation of "is just a list of integers." 😂
You probably want to also do this on __setitem__, here.
I'd be interested in implementing a solution to this issue. Beyond what was discussed above, I'm also interested in multi-dimensional indexing with slices. Something like:
# Example from above
rows = [3, 4]
a = np.arange(20).reshape((10, 2))
print(a[rows])
# What I'd like to have in addition
a[rows, :]
a[rows,1:3]
columns = [1,5,4]
a[:,columns]
a[3:4, colums]
The solution proposed by @jni would have to be expanded a little bit. I think this should work:
def is_pure_orthogonal_indexing(selection, ndim):
# Case 1: Selection is a single iterable of integers
if is_integer_list(selection) or is_integer_array(selection, ndim=1):
return True
# Case two: selection contains either zero or one integer iterables.
# All other selection elements are slices.
return (
isinstance(selection, tuple) and len(selection) == ndim and
sum(is_integer_list(elem) or is_integer_array(elem) for elem in selection) <=1 and
all(is_integer_list(elem) or is_integer_array(elem) or isinstance(elem, slice) for elem in selection)
)
And then the elif proposed by @jni would be elif is_pure_orthogonal_indexing(pure_selection, self.ndim).
If there are no objections to this approach, I'll be happy to make a PR.
NumPy's combined advanced and basic indexing is a little complex, but not terrible: https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing
As long as you follow those rules exactly (and test for compatibility with NumPy's rules), we can definitely extend the cases Zarr supports.
Safe to close this in favour of #1333?
Yes this was fixed by #1333, thanks for the reminder @MSanKeys963!