awkward icon indicating copy to clipboard operation
awkward copied to clipboard

Lengths of empty regular slices (in v2)

Open jpivarski opened this issue 2 years ago • 2 comments

Version of Awkward Array

HEAD

Description and code to reproduce

At least for v2 and variable-length arrays, we get the right zero-slicing edge case:

>>> import awkward._v2 as ak
>>> ak.Array([[1, 2, 3], [4, 5]])[:, []]
<Array [[], []] type='2 * var * int64'>

but v2 regular arrays have a zero-length error:

>>> ak.to_regular(ak.Array([[1, 2, 3], [4, 5, 6]]), axis=1)[:, []]
<Array [] type='0 * 3 * int64'>

What we want to have happen is for this slice to build a RegularArray with an explicit zeros_length argument, which is the only way to make a RegularArray with non-zero length yet contain lists of zero length:

>>> ak.Array(
...     ak.contents.RegularArray(
...         ak.contents.NumpyArray(np.arange(1, 7)),
...         size=0,
...         zeros_length=2,
...     )
... )
<Array [[], []] type='2 * 0 * int64'>

Pointed out by @grst.

v1 arrays are also incorrect, but this is an edge-case bug that only really needs to get fixed in v2. (v1 has only 4 more months left...)

jpivarski avatar Jul 19 '22 19:07 jpivarski

@grst, since you say that this is a blocker, I moved it up in the priority queue. Normally, an error about "What is the exact type of an array that doesn't contain any data?" would not be a high priority, but presumably it is for you because you need to make assumptions about that type to fit it into AnnData.

jpivarski avatar Jul 20 '22 13:07 jpivarski

We have some tests for those edge cases that fail currently. So while it blocks merging the PR, it does not block continuing development.

Thanks for looking into this, but no hurries! I am on vacation from tomorrow on and @giovp also said he currently doesn't have time to focus on the AnnData PR.

grst avatar Jul 20 '22 14:07 grst

I found another slicing edge case that is not fixed by the linked PR yet. I don't know if it's related or a separate issue, though:

Expected, numpy behaviour

np1 = np.ones((5, 7))
np1[:, []]
# array([], shape=(5, 0), dtype=float64)
np1[[], :]
# array([], shape=(0, 7), dtype=float64)

akward array behaviour

a1 = ak.Array(np.ones((5, 7)))
a1[:, []]
# <Array [] type='0 * 7 * float64'>
a1[[], :]
# <Array [] type='0 * 7 * float64'>

Version: v2 API, package installed with pip install git+https://github.com/scikit-hep/awkward/@ioanaif/fix-lengths-of-empty-regular-slices-1557

grst avatar Aug 12 '22 20:08 grst

Hi! I just added this corner-case in the tests for the linked PR and it successfully passed. All empty slice cases should be covered now.

ioanaif avatar Aug 15 '22 10:08 ioanaif

I confirm this works with your branch @ioanaif! I had run pip without --force-reinstall, so I actually tested against the old version before.

grst avatar Aug 15 '22 11:08 grst