zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

indexing with list not supported

Open brentp opened this issue 3 years ago • 10 comments

Minimal, reproducible code sample, a copy-pastable example if possible

import numpy as np
import zarr

rows = [3, 4]
a = np.arange(20).reshape((10, 2))

print(a[rows]) # OK

z = zarr.array(a)
print(z[rows]) # indexerror

Problem description

Based on this I expected this to work and to return the same rows as numpy. But it raises in index error.

Version and installation information

Please provide the following:

  • Value of zarr.__version__: 2.11.3

brentp avatar May 17 '22 09:05 brentp

Hi @brentp. Which cells are you expecting to receive? e.g.

In [1]: import zarr, numpy as np

In [2]: z = zarr.array(np.arange(20).reshape(10,2))

In [3]: z[:]
Out[3]:
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [4]: z[[0, 1], [1, 0]]
Out[4]: array([1, 2])

joshmoore avatar May 17 '22 09:05 joshmoore

Hi, I expect to get the same result as numpy, but index it's an indexerror. I have updated the example to be more complete with this code:

import numpy as np
import zarr

rows = [3, 4]
a = np.arange(20).reshape((10, 2))

print(a[rows])

z = zarr.array(a)
print(z[rows])

brentp avatar May 17 '22 09:05 brentp

Ah, I see. I remember there being limitations but I don't remember this one. @jni?

joshmoore avatar May 17 '22 09:05 joshmoore

Zarr also supports oindex and vindex, which may be another way to get at the functionality needed here

jakirkham avatar May 17 '22 18:05 jakirkham

z.oindex[rows] certainly returns the same values, but vindex is being used by z[rows].

joshmoore avatar May 17 '22 20:05 joshmoore

There's some overlap between these which can be a kind of confusing aspect of advanced indexing. Hence why vindex & oindex were added before advanced indexing was

jakirkham avatar May 17 '22 20:05 jakirkham

Yes, basically as I remember it @shoyer raised the point that mixing integer indexing with slices (which this use case implicitly uses) can be tricky to get right, so we left that out of the fancy indexing implementation in #725 for a braver soul to tackle. :stuck_out_tongue_winking_eye: The discussion starts here:

https://github.com/zarr-developers/zarr-python/pull/725#issuecomment-832192992

As I see it, the next "easy" step is to allow leading integer indexing, which won't reorder axes in either NumPy or zarr.Array.vindex (I think!). What do you think about this @joshmoore @jakirkham @shoyer?

jni avatar May 18 '22 05:05 jni

Sorry, I failed to note that no one else has responded. I'm all for incremental steps, but maybe there's a question of how to document the rest of the steps needed for the likes of @brentp.

joshmoore avatar May 31 '22 14:05 joshmoore

Having apparently killed the conversation, I'm inclined to say, "Go for it, @jni! 👏🏽"

joshmoore avatar Jul 11 '22 12:07 joshmoore

I'm currently on leave, but a PR would be pretty easy, hopefully @brentp or someone else can handle it. This is the bit of code that needs modifying:

https://github.com/zarr-developers/zarr-python/blob/9ce2f328e9ff687c151cd73634f4d624590df817/zarr/core.py#L784-L789

Before the else:, add an elif clause:

elif is_just_a_list_of_integers(pure_selection):
    return self.oindex[pure_selection]

For an appropriate implementation of "is just a list of integers." 😂

You probably want to also do this on __setitem__, here.

jni avatar Jul 13 '22 09:07 jni

I'd be interested in implementing a solution to this issue. Beyond what was discussed above, I'm also interested in multi-dimensional indexing with slices. Something like:

# Example from above
rows = [3, 4]
a = np.arange(20).reshape((10, 2))
print(a[rows])

# What I'd like to have in addition
a[rows, :]
a[rows,1:3]

columns = [1,5,4]
a[:,columns]
a[3:4, colums]

The solution proposed by @jni would have to be expanded a little bit. I think this should work:

def is_pure_orthogonal_indexing(selection, ndim):
    # Case 1: Selection is a single iterable of integers
    if is_integer_list(selection) or is_integer_array(selection, ndim=1):
        return True

    # Case two: selection contains either zero or one integer iterables. 
    # All other selection elements are slices.
    return (
        isinstance(selection, tuple) and len(selection) == ndim and 
        sum(is_integer_list(elem) or is_integer_array(elem) for elem in selection) <=1 and
        all(is_integer_list(elem) or is_integer_array(elem) or isinstance(elem, slice) for elem in selection)
    )

And then the elif proposed by @jni would be elif is_pure_orthogonal_indexing(pure_selection, self.ndim).

If there are no objections to this approach, I'll be happy to make a PR.

AndreasAlbertQC avatar Jan 24 '23 09:01 AndreasAlbertQC

NumPy's combined advanced and basic indexing is a little complex, but not terrible: https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing

As long as you follow those rules exactly (and test for compatibility with NumPy's rules), we can definitely extend the cases Zarr supports.

shoyer avatar Jan 24 '23 18:01 shoyer

Safe to close this in favour of #1333?

sanketverma1704 avatar Mar 27 '23 19:03 sanketverma1704

Yes this was fixed by #1333, thanks for the reminder @MSanKeys963!

jni avatar Mar 28 '23 07:03 jni