julia WIP: a more principled take on dimensional reduction inits

As part of the work towards #52397 and #53945, this drastically simplifies the initialization of dimensional reductions and is taking steps towards a unification of the (non-dimensional) mapreduce with the dims flavor.

This completely removes all guesswork on initial values for the dimensional reductions. There are 2*3 possible ways we should start a reduction, depending upon how many values are in the reduction and if there's an init:

0 values:
- Have init? Return it. Otherwise:
- Return mapreduce_empty (usually an error, sometimes an unambiguous value like 0.0)
1 value (a):
- Have init? Return op(init, f(a)). Otherwise:
- Return mapreduce_first (typically f(a))
2+ values (a1, a2):
- Have init? Start every chain with op(op(init, f(a1)), f(a2)) and continue. Otherwise:
- Start every chain with op(f(a1), f(a2)) and continue

When looking at dimensional reductions, the above are all arrays of multiple such values. Which means we need to get an eltype. This uses the builtin incremental widening on every single update to the R vector. This wouldn't normally be so tricky — it's just two nested for loops. The trouble is that the naive double for loop is leaving 100x+ performance on the table when it doesn't iterate in a cache-friendly manner. So you might indeed want to update a single value in R multiple times — and possibly widen as you do so.

It's so very tempting to do the standard Julian mapreduce(f, op, A, dims) = mapreduce!(f, op, init(f, op, A, dims), A) idiom. But that simply doesn't work because there's not a single way to initialize the beginning of a reduction. There are SIX. But then we add support for pre-allocated result vectors and you even moreso want to be able to just write, e.g., sum!(r, A) = mapreduce!(identity, +, first_step!(identity, +, r, A), A). But at that point you've already done the first step, so you need to be able to continue the reduction from an arbitrary point in the iteration.

This is a WIP because I ~~expect it to have some fairly significant performance regressions at the moment. Some may be easy to recover, others may be harder~~ need to fix the stdlib failures and then let nanosoldier loose.

Closes: #45566, closes #47231, closes #55213, closes #31427, closes #41054, closes #54920, closes #42259, closes #38660, closes #21097, closes #32366.

Addresses half of #45562.

Jul 31 '24 03:07 mbauman

~~Looks like this is introducing a regression that's adjacent to https://github.com/JuliaLang/julia/issues/32366#issuecomment-503826041. sum([1 0 missing], dims=2) works on master but is currently broken here (and I don't think is tested).~~ Now fixed and tested.

Jul 31 '24 04:07 mbauman

I think I've addressed the most egregious of the performance issues, but I've only done spot tests so far. Now I need to tackle those stdlibs.

Jul 31 '24 21:07 mbauman

Not to open a can of worms, but it occurs to me that in cases like:

julia> struncate("🍕🍕🍕🍕🍕🍕xxxxxxxxxxx", 9, "…")
"🍕🍕🍕…xx"

The "centre" truncate call has 6 cells of 🍕 on the left, and 2 of x, which isn't very centred :grimacing:.

(Sorry Ian :sweat_smile:)

Aug 06 '24 17:08 tecosaur

@tecosaur I suspect you meant that comment for #55351

Aug 07 '24 13:08 mbauman

Ah oops, yes sorry.

Aug 07 '24 14:08 tecosaur

does the "help wanted" here refer to performance optimizations? or are there other, deeper, darker problems. that is --- would it be accurate to say that this PR implements all the desired semantics and causes a few benchmark regressions?

Dec 04 '24 15:12 adienes

It's all of the above. This needs:

Work to get the stdlibs passing tests
Running of PkgEval and examining the results
Fixing the issues @N5N3 flagged
Running benchmarks and evaluating the impact

Dec 04 '24 15:12 mbauman

I have some proposed updates in this branch here https://github.com/adienes/julia/pull/1

what would be the recommend way to merge (or at least compare) to this pr? should I draft a PR directly onto mbauman:mb/mapreducedim-init ?

Dec 21 '24 22:12 adienes

@adienes I just pulled your commits directly. Thanks!

Apr 21 '25 21:04 mbauman