array-api
array-api copied to clipboard
RFC: add `abs2` to compute the square of the absolute value
This RFC proposes to add a new API, abs2
, to the standard for computing the square of the absolute value.
For complex numbers, the absolute value is the Euclidean norm of the real and imaginary components.
This API has particular value for complex numbers, as the alternative abs(x)**2
requires taking the square root and the alternative x*conj(x)
potentially requires additional memory allocation before computation.
Background
This was originally discussed in the Array API issue tracker in gh-153 (comment) and included in RFC gh-373 for adding complex number support to the specification.
Prior to that, a proposal was made on the NumPy issue tracker (https://github.com/numpy/numpy/issues/13179) to add abs2
in response to an issue originally filed on NumPy in 2013.
The abs2
proposal was given additional life in late 2021 with an effort to add to SciPy under an abs_sq
alias.
Part of the motivation for adding to SciPy is due to NumPy's desire to avoid expanding its API. This desire stems from not having a clear process or criteria for determining what should and what should not be included in NumPy.
However, the array API standard provides NumPy with such a process, as decision making for new APIs is effectively punted to this standards body.
Accordingly, were the array API specification to add abs2
, NumPy would follow suit.
This brings me to this proposal which seeks to add abs2
to the array API standard and effectively settle the question as to where abs2
should live.
And as abs2
has general applicably outside of NumPy and in other array libraries, abs2
seems like a reasonable and straightforward API to add to the standard.
Proposal
The API for abs2
would follow similar element-wise APIs...
def abs2(x: array, /) -> array
Notes
- This proposal does not follow https://github.com/scipy/scipy/pull/15390 in naming the API
abs_sq
, as (1) IMO the increase in characters does not provide any additional clarity in terms of behavior or functionality and (2)abs2
follows precedent elsewhere (Julia and GSL). - For real-valued numbers, the square of the absolute value is the same as the square
x*x
.
Prior Art
This API has particular value for complex numbers, as the alternative
abs(x)**2
requires taking the square root and the alternativex*conj(x)
potentially requires additional memory allocation before computation.
I am skeptical that this makes sense, but not strongly opposed.
There are nearly an infinite number of composed operations that potentially require additional memory allocation.
If this level of performance matters, users are not going to see satisfactory results from using the generic array API. They should use a JIT compiler or write specialized compiled routines by hand.
There is very little to even negative perf gain with the SciPy abs2
-equivalent and thus it's effectively on hold (see the benchmark which is also the last reply in that PR), so unless someone sits down and shows us it's worth the perf with, say, a hand-written CUDA kernel (it's actually very simple exercise, perhaps I should assign this to our interns 🙂), I am with Stephan on this. At least we don't see perf gain on CPUs.
@leofang that seems just a problem with that specific implementation. I would expect it to be quite a lot faster on any hardware. (Less memory throughput and if sqrt
is used, also less FLOPs. Although, I guess it may need similar optimizations as the alternatives.)
Personally, I am not sure I would mind adding it to NumPy, although I might prefer to not have it in the main namespace. (And I am lacking a good idea for that namespace.)
Anything involving calling sqrt
is doomed, so let's rule it out of the discussion.
AFAIR, "less memory throughput" has never been the primary goal of the API design, otherwise we'd have accepted the out=
and in-place proposals, so this -- plus you both were torn with finding the justification
There are nearly an infinite number of composed operations that potentially require additional memory allocation
and
although I might prefer to not have it in the main namespace
-- made me think unless it's really "a lot faster" (it's a literal quote) it's not worth keeping me torn too by considering it.
although I might prefer to not have it in the main namespace
-- made me think unless it's really "a lot faster" (it's a literal quote) it's not worth keeping me torn too by considering it.
That sounds about right. The Julia implementation linked to is a one-liner:
abs2(z::Complex) = real(z)*real(z) + imag(z)*imag(z)
That one-liner isn't going to be faster:
>>> x = np.ones(1_000_000, dtype=np.float64)
>>> y = x.copy()
>>> y = x.copy()
>>> y.dtype
>>> z = x + y * 1.j
>>> z.shape
(1000000,)
>>> y.shape
(1000000,)
>>> x.shape
(1000000,)
>>> x.dtype
dtype('float64')
>>> z.dtype
dtype('complex128')
>>> y.dtype
dtype('float64')
>>> %timeit z * np.conj(z)
3.43 ms ± 32.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x*x + y*y
2.21 ms ± 7.68 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
EDIT: apologies, first version contained an error - it is a little faster
If this level of performance matters, users are not going to see satisfactory results from using the generic array API. They should use a JIT compiler or write specialized compiled routines by hand.
Libraries which already do have a JIT compiler aren't going to be too happy with abs2
I imagine, since it's just as easy for them to rewrite abs(x)**2
, so the extra API surface doesn't do anything extra.
I don't think we have a good rule for when performance justified adding a composite function like this. There's some interaction with how frequently it's used and with ergonomics as well of course. My gut feeling is:
- Performance gain <2x is not worth considering
- In the 2x - 4x range it may be worth considering - but still infrequent to be compelling; it depends on usage frequency
- When it gets to >4x it's starting to look more compelling
Based on the most recent consortium meeting (2022-06-23), consensus was not to move forward with this proposal.
However, we would like to look into the possibility of providing generic APIs for user defined functions (i.e., standardized ways for array API consumers to, e.g., element-wise, apply arbitrary functions to arrays).
However, we would like to look into the possibility of providing generic APIs for user defined functions (i.e., standardized ways for array API consumers to, e.g., element-wise, apply arbitrary functions to arrays).
Let's open a new issue for tracking this. I'm not sure how actionable it is, but it's worth investigating. And also that issue may serve as a reference/collection point for other requests for composite APIs.
Thanks for the input all!