ENH: Use array indexing preparation routines for flatiter objects
- Use
prepare_indexiniter_subscriptanditer_ass_subscript. This fixes various cases that were broken before:arr.flat[[True, True]]arr.flat[[1.0, 1.0]]arr.flat[()] = 0
- Add more extensive tests for
flatiterindexing operations
Closes #28314.
This is a big refactor, so I think we'll need at least two experienced developers to go over the C code changes, so that might take a while. I'll try to do a pass focusing on the correctness of the C code soon. On a first, high-level pass this looks like mostly simplification and cleanup.
I think you should also try running the indexing benchmarks to see if there are any significant regressions in existing benchmarks. I think bench_indexing already captures several workflows that go through the changed low-level C code path.
It would also be nice to get new entries in the FlatIterIndexing benchmark for newly added functionality.
These are the results of running the (old & new) benchmarks:
| Change | Before [93898621] <main> | After [cfcdabf0] <use-prepare-index-flatiter> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|-------------------------------------------------|-----------|-------------------------------------------------------------|
| + | 87.0±0.4ns | 43.0±0.2ms | 494276 | bench_indexing.FlatIterIndexing.time_flat_empty_tuple_index |
| + | 115±3ns | 479±8ns | 4.15 | bench_indexing.FlatIterIndexing.time_flat_bool_index_0d |
| + | 39.4±0.3ms | 42.8±0.3ms | 1.09 | bench_indexing.FlatIterIndexing.time_flat_ellipsis_index |
| + | 3.95±0.04ms | 4.29±0.07ms | 1.09 | bench_indexing.FlatIterIndexing.time_flat_slice_index |
It looks like having special cases for tuple, ellipses etc. (instead of going through prepare_index) did have an impact on performance. Should we try and keep those special cases in?
Should we try and keep those special cases in?
Probably
Should we try and keep those special cases in?
Probably
I added a couple of special cases for an empty tuple and boolean indexes. This fixes the two worst performance regressions. I feel that the rest are acceptable, since this goes through a much more complex code path to make sure that everything is set up correctly.
I added the 2.3 milestone to make sure we don't drop reviewing this before cutting the release.
I added the 2.3 milestone
@ngoldbaum I am about to push this off to 2.4 unless you want to put it in very soon.
I spoke with Lysandros and he said it's OK to push this off. We'll coordinate on getting this reviewed soon.
@seberg Sorry for taking so long with this and thanks a lot for the thorough review. It was really helpful!
I've addressed all of the feedback so it should be in a better state now. Looking foward to hearing your thoughts.
Friendly ping here. I'm gonna be on PTO starting Aug 11th, so if we could get another round of feedback on this before I go, that'd be very helpful!
Tests look good. This should be really close now @seberg! Also, thanks for the reviews and the patience!
Unfortunately the change to include PyArray_SIZE(tmp_arr) != 0 fails the following test because it doesn't warn anymore. What do we wanna do here? Is this acceptable?
def test_empty_string_flat_index_on_flatiter(self):
a = np.arange(9).reshape((3, 3))
b = np.array([], dtype="S")
with pytest.warns(DeprecationWarning,
match="Invalid non-array indices for iterator objects are "
"deprecated"):
assert_equal(a.flat[b.flat], np.array([]))
I want to fix it to do the same thing as for arrays, but I am slightly confused why np.arange(3)[np.array([], dtype="S")] does the right thing.
That's probably because of this new if in mapping.c.
if (PyArray_SIZE(tmp_arr) == 0
|| (is_flatiter_object && !PyArray_ISINTEGER(tmp_arr) && !PyArray_ISBOOL(tmp_arr))) {
The string array is cast to an integer array and since it's got no elements, there's no issues there.
Oh, nvm, let's just just remove the test, or only test for a string array. The array case just lets this one pass, because it can't doesn't distinguish between a [] and array([]).flat.
The point is, that it isn't a string array, its a string flatiter... And we never handle it correctly, so I don't think we should care for this PR.
Thanks @lysnikolaou this was a really nice cleanup, I think. And fixing up flatiter a bit more is also nice.