ak.any with keepdims changes layout (with implications for slicing)
Version of Awkward Array
2.8.5
Description and code to reproduce
If I construct a 2D boolean array, and use it to slice itself, I get a 2D result:
>>> test_arr = ak.Array([[True]])
>>> test_arr[test_arr])
<Array [[True]] type='1 * var * bool'>
When I apply ak.any with keepdims=True, the result looks superficially similar, but slicing now removes a dimension:
>>> test_arr = ak.any(test_arr, axis=1, keepdims=True)
>>> test_arr
<Array [[True]] type='1 * 1 * bool'>
>>> test_arr[test_arr]
<Array [True] type='1 * bool'>
This differs from awkward1, where the result would still be 2D in the final case, which is the more intuitive and useful behaviour, I feel. Is this a deliberate change, and in either case would it be possible to switch back to the awkward 1 behaviour, please?
@Dominic-Stafford - thanks for bringing it up! In awkward 2, the boolean indexing behavior has changed to be more NumPy-consistent in spirit - but due to the jagged nature of Awkward Arrays, it's not a 1:1 mapping.
Here's the crux:
- When you slice with an array of booleans, Awkward applies the mask elementwise across the outermost dimension.
- After applying
ak.any(..., keepdims=True), the structure becomes:
[[True]]
i.e., still 2D, but jagged, with one element at the outer level and one element at the inner level.
But now:
test_arr[test_arr]
This says: "Select elements from the outer dimension (i.e., [ [True] ]) where the boolean mask is True".
Since only one outer list exists, and it's [[True]], the condition is met.
But the indexing now flattens one layer of structure, resulting in:
[True]
with type 1 * bool, i.e., 1D.
This is because the boolean mask applies at the outermost level, and Awkward now simplifies the structure to reflect the slicing depth more aggressively than Awkward 1 did. This change is part of a broader rewrite in Awkward v2, which simplified the internals and changed behaviors to be:
- More consistent with jagged/variable-length array logic.
- Aligned with broadcasting and indexing conventions.
- More performant and composable, even if sometimes at the cost of keeping prior heuristics.
If you think, that restoring dimensionality is needed, I'd suggest to flag this issue as a feature request. Perhaps, adding a flag or function to restore Awkward 1-style boolean slicing behavior automatically.
Hi @ianna I'm also curious a bit about some inconsistency with numpy here. You see below, awkward allows this slicing for a regular array while numpy errors for the same array. Is there a good reason for this that I don't naively understand?
In [57]: x = ak.to_regular([[True, False], [False, True]])
In [58]: mask = np.any(x, axis=1, keepdims=True)
In [59]: x
Out[59]: <Array [[True, False], [False, True]] type='2 * 2 * bool'>
In [60]: mask
Out[60]: <Array [[True], [True]] type='2 * 1 * bool'>
In [61]: x[mask]
Out[61]: <Array [True, False] type='2 * bool'>
In [62]: x = np.array([[True, False], [False, True]])
In [63]: mask = np.any(x, axis=1, keepdims=True)
In [64]: x
Out[64]:
array([[ True, False],
[False, True]])
In [65]: mask
Out[65]:
array([[ True],
[ True]])
In [66]: x[mask]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[66], line 1
----> 1 x[mask]
IndexError: boolean index did not match indexed array along axis 1; size of axis is 2 but size of corresponding boolean axis is 1
@ikrommyd - NumPy requires boolean masks to match the dimensions of the axis being indexed:
- x has shape (2,2).
- mask has shape (2,1) → does not match axis 1 size (2).
- Boolean indexing rules in NumPy are strict: the mask must either have the same shape as the array or be broadcastable in a certain way (without extra singleton dimensions for the indexed axis).
So NumPy refuses because axis=1 of x has length 2, but your mask has length 1 along that axis.
Awkward is designed to handle jagged and irregular arrays, so it’s much more flexible:
- It treats x[mask] like a flattened selection along the outermost axis.
- It allows mask shapes that don’t exactly match the array, as long as they make sense in the jagged context.
- In your example, it effectively flattens the result along the masked axis, giving [True, False].
Essentially, Awkward is more “forgiving” because it’s meant to handle nested structures of varying lengths.
@Dominic-Stafford - this is a subtle behavior change between Awkward 1.x and Awkward 2.x.
What’s happening Example 1: slicing with itself
import awkward as ak
test_arr = ak.Array([[True]])
test_arr[test_arr]
-
test_arris1 * var * bool→ a 1×variable-length array. -
Boolean indexing in Awkward selects elements along the first axis where the mask is True.
-
Result:
[[True]]→ still a 2D array, as expected.
Example 2: using ak.any with keepdims=True
test_arr = ak.Array([[True]])
test_arr2 = ak.any(test_arr, axis=1, keepdims=True)
print(test_arr2)
-
test_arr2is1 * 1 * bool→ shape 1×1 (keepdims preserves the axis). -
Now you slice:
test_arr2[test_arr2]
-
Result:
[True]→ shape 1 (dimension is removed).
Why?
In Awkward 2.x:
-
Boolean indexing always flattens the outermost axis after masking.
-
Even if
keepdims=True, the result ofarray[mask]loses the outer axis dimension if the mask is “fully True” along that axis. -
This is a deliberate change from Awkward 1.x, which kept the 2D structure in this situation.
Why the change was made Awkward 2.x introduced more consistent semantics for jagged arrays:
-
Boolean indexing is meant to always return the selected elements along the masked axis
-
In nested/jagged structures, keeping a “dimension of length 1” after masking can be ambiguous.
-
So
keepdimsonly affects aggregation functions likeanyorsum, not slicing.
keepdims affects the output of ak.any, but boolean masking uses the mask as a selector, which is “flattened” in Awkward 2.x.
Can you get the old behavior? Yes, but you need to explicitly wrap the result in a new dimension after masking. For example:
import awkward as ak
test_arr = ak.Array([[True]])
test_arr2 = ak.any(test_arr, axis=1, keepdims=True)
# wrap with a list to restore 2D structure
result = ak.Array([test_arr2[test_arr2]])
print(result)
Output:
Array [[True]] type='1 * var * bool'
You now mimic the Awkward 1.x behavior.
For larger arrays, you might need ak.unflatten or ak.layout.RegularArray depending on the shape.
Summary
| Feature | Awkward 1.x | Awkward 2.x |
|---|---|---|
| Boolean indexing on 2D array | Keeps outer dimension | Flattens outer dimension |
| keepdims=True on ak.any | Works for slice too | Only affects ak.any, not slice |
| Workaround | Wrap in ak.Array([ ... ]) or ak.unflatten | Same |
So this is deliberate, not a bug, because Awkward 2.x emphasizes consistency in jagged/nested semantics.
Please, let me know if this explanation is satisfactory and we can close the issue. Thanks!
Hi @ianna, sorry for the long silence on my side, other projects have absorbed all of my time and I haven't managed to work on this migration since I made this issue. From the practical side we can indeed work around this, however to me the new implementation feels inconsistent within awkward, as now slices reduce a dimension by one if it is regular and not if it is jagged, but it's not immediately obvious to the user what nature this inner dimension has. Also from a convenience point of view I use keepdims=True to keep dimensions for later steps of the calculation, but having an object which when used to slice reduces the dimension anyway is unhelpful for this. Please correct me if I've misunderstood, however the old implementation made more intuitive sense to me, which to me is more important than consistency with numpy.