numpy icon indicating copy to clipboard operation
numpy copied to clipboard

ENH: Use array indexing preparation routines for flatiter objects

Open lysnikolaou opened this issue 9 months ago • 7 comments

  • Use prepare_index in iter_subscript and iter_ass_subscript. This fixes various cases that were broken before:
    • arr.flat[[True, True]]
    • arr.flat[[1.0, 1.0]]
    • arr.flat[()] = 0
  • Add more extensive tests for flatiter indexing operations

Closes #28314.

lysnikolaou avatar Mar 26 '25 10:03 lysnikolaou

This is a big refactor, so I think we'll need at least two experienced developers to go over the C code changes, so that might take a while. I'll try to do a pass focusing on the correctness of the C code soon. On a first, high-level pass this looks like mostly simplification and cleanup.

I think you should also try running the indexing benchmarks to see if there are any significant regressions in existing benchmarks. I think bench_indexing already captures several workflows that go through the changed low-level C code path.

It would also be nice to get new entries in the FlatIterIndexing benchmark for newly added functionality.

ngoldbaum avatar Mar 28 '25 14:03 ngoldbaum

These are the results of running the (old & new) benchmarks:

| Change   | Before [93898621] <main>   | After [cfcdabf0] <use-prepare-index-flatiter>   |     Ratio | Benchmark (Parameter)                                       |
|----------|----------------------------|-------------------------------------------------|-----------|-------------------------------------------------------------|
| +        | 87.0±0.4ns                 | 43.0±0.2ms                                      | 494276    | bench_indexing.FlatIterIndexing.time_flat_empty_tuple_index |
| +        | 115±3ns                    | 479±8ns                                         |      4.15 | bench_indexing.FlatIterIndexing.time_flat_bool_index_0d     |
| +        | 39.4±0.3ms                 | 42.8±0.3ms                                      |      1.09 | bench_indexing.FlatIterIndexing.time_flat_ellipsis_index    |
| +        | 3.95±0.04ms                | 4.29±0.07ms                                     |      1.09 | bench_indexing.FlatIterIndexing.time_flat_slice_index       |

It looks like having special cases for tuple, ellipses etc. (instead of going through prepare_index) did have an impact on performance. Should we try and keep those special cases in?

lysnikolaou avatar Mar 28 '25 16:03 lysnikolaou

Should we try and keep those special cases in?

Probably

ngoldbaum avatar Mar 28 '25 17:03 ngoldbaum

Should we try and keep those special cases in?

Probably

I added a couple of special cases for an empty tuple and boolean indexes. This fixes the two worst performance regressions. I feel that the rest are acceptable, since this goes through a much more complex code path to make sure that everything is set up correctly.

lysnikolaou avatar Apr 23 '25 13:04 lysnikolaou

I added the 2.3 milestone to make sure we don't drop reviewing this before cutting the release.

ngoldbaum avatar Apr 23 '25 17:04 ngoldbaum

I added the 2.3 milestone

@ngoldbaum I am about to push this off to 2.4 unless you want to put it in very soon.

charris avatar May 19 '25 18:05 charris

I spoke with Lysandros and he said it's OK to push this off. We'll coordinate on getting this reviewed soon.

ngoldbaum avatar May 19 '25 19:05 ngoldbaum

@seberg Sorry for taking so long with this and thanks a lot for the thorough review. It was really helpful!

I've addressed all of the feedback so it should be in a better state now. Looking foward to hearing your thoughts.

lysnikolaou avatar Jul 17 '25 11:07 lysnikolaou

Friendly ping here. I'm gonna be on PTO starting Aug 11th, so if we could get another round of feedback on this before I go, that'd be very helpful!

lysnikolaou avatar Aug 04 '25 12:08 lysnikolaou

Tests look good. This should be really close now @seberg! Also, thanks for the reviews and the patience!

lysnikolaou avatar Aug 06 '25 17:08 lysnikolaou

Unfortunately the change to include PyArray_SIZE(tmp_arr) != 0 fails the following test because it doesn't warn anymore. What do we wanna do here? Is this acceptable?

    def test_empty_string_flat_index_on_flatiter(self):
        a = np.arange(9).reshape((3, 3))
        b = np.array([], dtype="S")
        with pytest.warns(DeprecationWarning,
                          match="Invalid non-array indices for iterator objects are "
                                "deprecated"):
            assert_equal(a.flat[b.flat], np.array([]))

lysnikolaou avatar Aug 08 '25 13:08 lysnikolaou

I want to fix it to do the same thing as for arrays, but I am slightly confused why np.arange(3)[np.array([], dtype="S")] does the right thing.

seberg avatar Aug 08 '25 14:08 seberg

That's probably because of this new if in mapping.c.

            if (PyArray_SIZE(tmp_arr) == 0
                || (is_flatiter_object && !PyArray_ISINTEGER(tmp_arr) && !PyArray_ISBOOL(tmp_arr))) {

The string array is cast to an integer array and since it's got no elements, there's no issues there.

lysnikolaou avatar Aug 08 '25 14:08 lysnikolaou

Oh, nvm, let's just just remove the test, or only test for a string array. The array case just lets this one pass, because it can't doesn't distinguish between a [] and array([]).flat.

seberg avatar Aug 08 '25 14:08 seberg

The point is, that it isn't a string array, its a string flatiter... And we never handle it correctly, so I don't think we should care for this PR.

seberg avatar Aug 08 '25 14:08 seberg

Thanks @lysnikolaou this was a really nice cleanup, I think. And fixing up flatiter a bit more is also nice.

seberg avatar Aug 08 '25 14:08 seberg