array-api icon indicating copy to clipboard operation
array-api copied to clipboard

RFC: add `abs2` to compute the square of the absolute value

Open kgryte opened this issue 2 years ago • 7 comments

This RFC proposes to add a new API, abs2, to the standard for computing the square of the absolute value.

For complex numbers, the absolute value is the Euclidean norm of the real and imaginary components.

This API has particular value for complex numbers, as the alternative abs(x)**2 requires taking the square root and the alternative x*conj(x) potentially requires additional memory allocation before computation.

Background

This was originally discussed in the Array API issue tracker in gh-153 (comment) and included in RFC gh-373 for adding complex number support to the specification.

Prior to that, a proposal was made on the NumPy issue tracker (https://github.com/numpy/numpy/issues/13179) to add abs2 in response to an issue originally filed on NumPy in 2013.

The abs2 proposal was given additional life in late 2021 with an effort to add to SciPy under an abs_sq alias.

Part of the motivation for adding to SciPy is due to NumPy's desire to avoid expanding its API. This desire stems from not having a clear process or criteria for determining what should and what should not be included in NumPy.

However, the array API standard provides NumPy with such a process, as decision making for new APIs is effectively punted to this standards body.

Accordingly, were the array API specification to add abs2, NumPy would follow suit.

This brings me to this proposal which seeks to add abs2 to the array API standard and effectively settle the question as to where abs2 should live.

And as abs2 has general applicably outside of NumPy and in other array libraries, abs2 seems like a reasonable and straightforward API to add to the standard.

Proposal

The API for abs2 would follow similar element-wise APIs...

def abs2(x: array, /) -> array

Notes

  • This proposal does not follow https://github.com/scipy/scipy/pull/15390 in naming the API abs_sq, as (1) IMO the increase in characters does not provide any additional clarity in terms of behavior or functionality and (2) abs2 follows precedent elsewhere (Julia and GSL).
  • For real-valued numbers, the square of the absolute value is the same as the square x*x.

Prior Art

  • C++ has a norm API since C++11. This is not an ideal name, as, while the value calculated by this function is the "field norm", this is likely not to match user expectations concerning a "norm", which is better represented by abs.
  • GSL has abs2.
  • Julia has abs2.

kgryte avatar Jun 22 '22 23:06 kgryte

This API has particular value for complex numbers, as the alternative abs(x)**2 requires taking the square root and the alternative x*conj(x) potentially requires additional memory allocation before computation.

I am skeptical that this makes sense, but not strongly opposed.

There are nearly an infinite number of composed operations that potentially require additional memory allocation.

If this level of performance matters, users are not going to see satisfactory results from using the generic array API. They should use a JIT compiler or write specialized compiled routines by hand.

shoyer avatar Jun 23 '22 01:06 shoyer

There is very little to even negative perf gain with the SciPy abs2-equivalent and thus it's effectively on hold (see the benchmark which is also the last reply in that PR), so unless someone sits down and shows us it's worth the perf with, say, a hand-written CUDA kernel (it's actually very simple exercise, perhaps I should assign this to our interns 🙂), I am with Stephan on this. At least we don't see perf gain on CPUs.

leofang avatar Jun 23 '22 01:06 leofang

@leofang that seems just a problem with that specific implementation. I would expect it to be quite a lot faster on any hardware. (Less memory throughput and if sqrt is used, also less FLOPs. Although, I guess it may need similar optimizations as the alternatives.)

Personally, I am not sure I would mind adding it to NumPy, although I might prefer to not have it in the main namespace. (And I am lacking a good idea for that namespace.)

seberg avatar Jun 23 '22 02:06 seberg

Anything involving calling sqrt is doomed, so let's rule it out of the discussion.

AFAIR, "less memory throughput" has never been the primary goal of the API design, otherwise we'd have accepted the out= and in-place proposals, so this -- plus you both were torn with finding the justification

There are nearly an infinite number of composed operations that potentially require additional memory allocation

and

although I might prefer to not have it in the main namespace

-- made me think unless it's really "a lot faster" (it's a literal quote) it's not worth keeping me torn too by considering it.

leofang avatar Jun 23 '22 03:06 leofang

although I might prefer to not have it in the main namespace

-- made me think unless it's really "a lot faster" (it's a literal quote) it's not worth keeping me torn too by considering it.

That sounds about right. The Julia implementation linked to is a one-liner:

abs2(z::Complex) = real(z)*real(z) + imag(z)*imag(z)

That one-liner isn't going to be faster:

>>> x = np.ones(1_000_000, dtype=np.float64)
>>> y = x.copy()
>>> y = x.copy()
>>> y.dtype
>>> z = x + y * 1.j
>>> z.shape
(1000000,)
>>> y.shape
(1000000,)
>>> x.shape
(1000000,)
>>> x.dtype
dtype('float64')
>>> z.dtype
dtype('complex128')
>>> y.dtype
dtype('float64')

>>> %timeit z * np.conj(z)
3.43 ms ± 32.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit x*x + y*y
2.21 ms ± 7.68 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

EDIT: apologies, first version contained an error - it is a little faster

If this level of performance matters, users are not going to see satisfactory results from using the generic array API. They should use a JIT compiler or write specialized compiled routines by hand.

Libraries which already do have a JIT compiler aren't going to be too happy with abs2 I imagine, since it's just as easy for them to rewrite abs(x)**2, so the extra API surface doesn't do anything extra.

rgommers avatar Jun 23 '22 13:06 rgommers

I don't think we have a good rule for when performance justified adding a composite function like this. There's some interaction with how frequently it's used and with ergonomics as well of course. My gut feeling is:

  • Performance gain <2x is not worth considering
  • In the 2x - 4x range it may be worth considering - but still infrequent to be compelling; it depends on usage frequency
  • When it gets to >4x it's starting to look more compelling

rgommers avatar Jun 23 '22 13:06 rgommers

Based on the most recent consortium meeting (2022-06-23), consensus was not to move forward with this proposal.

However, we would like to look into the possibility of providing generic APIs for user defined functions (i.e., standardized ways for array API consumers to, e.g., element-wise, apply arbitrary functions to arrays).

kgryte avatar Jun 23 '22 18:06 kgryte

However, we would like to look into the possibility of providing generic APIs for user defined functions (i.e., standardized ways for array API consumers to, e.g., element-wise, apply arbitrary functions to arrays).

Let's open a new issue for tracking this. I'm not sure how actionable it is, but it's worth investigating. And also that issue may serve as a reference/collection point for other requests for composite APIs.

Thanks for the input all!

rgommers avatar Sep 05 '22 17:09 rgommers