awkward icon indicating copy to clipboard operation
awkward copied to clipboard

np.real, np.imag, np.round, np.around can't take an Awkward Array as input

Open jpivarski opened this issue 7 months ago • 10 comments

Version of Awkward Array

HEAD

Description and code to reproduce

This is more like a strange "gotcha" of NumPy, but np.real and np.imag are not ufuncs, unlike, for example, np.conjugate.

>>> np.real
<function real at 0x7e73131d8930>
>>> np.imag
<function imag at 0x7e73131d8b30>
>>> np.conjugate
<ufunc 'conjugate'>

Therefore, it would require some special handling to make it work in Awkward, but not much. Here's what happens now:

>>> np.real(ak.Array([[1+0.1j, 2+0.2j, 3+0.3j], [], [4+0.4j, 5+0.5j]]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward/src/awkward/highlevel.py", line 1527, in __array_function__
    return ak._connect.numpy.array_function(
  File "/home/jpivarski/irishep/awkward/src/awkward/_connect/numpy.py", line 109, in array_function
    rectilinear_args = tuple(_to_rectilinear(x, backend) for x in args)
  File "/home/jpivarski/irishep/awkward/src/awkward/_connect/numpy.py", line 109, in <genexpr>
    rectilinear_args = tuple(_to_rectilinear(x, backend) for x in args)
  File "/home/jpivarski/irishep/awkward/src/awkward/_connect/numpy.py", line 78, in _to_rectilinear
    return layout.to_backend(backend).to_backend_array(allow_missing=True)
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/content.py", line 1021, in to_backend_array
    return self._to_backend_array(allow_missing, backend)
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/listoffsetarray.py", line 2078, in _to_backend_array
    return self.to_RegularArray()._to_backend_array(allow_missing, backend)
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/listoffsetarray.py", line 288, in to_RegularArray
    self._backend.maybe_kernel_error(
  File "/home/jpivarski/irishep/awkward/src/awkward/_backends/backend.py", line 67, in maybe_kernel_error
    raise ValueError(self.format_kernel_error(error))
ValueError: cannot convert to RegularArray because subarray lengths are not regular (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-26/awkward-cpp/src/cpu-kernels/awkward_ListOffsetArray_toRegularArray.cpp#L22)

It's trying to auto-convert the Awkward Array into a NumPy array to use the function, and it can't because this one is ragged.

jpivarski avatar Dec 29 '23 19:12 jpivarski

A workaround for np.real(a):

(a + np.conjugate(a)) / 2

and a workaround for np.imag(a):

(a - np.conjugate(a)) / 2j

The Array API spec says that these should have floating point type, which can be determined from the complex a.dtype using

.astype(f"f{a.dtype.itemsize // 2}")

if it weren't for the

ComplexWarning: Casting complex values to real discards the imaginary part

That's not something that can be fixed with np.errstate...

jpivarski avatar Dec 29 '23 19:12 jpivarski

(It's not necessary to self-assign! I'm just logging these as I go. :)

I found another one: np.round is not a ufunc, but you'd think it is.

>>> np.round
<function round at 0x7e6a458334b0>

I'll have a work-around for this in just a moment.

jpivarski avatar Dec 30 '23 11:12 jpivarski

Given a,

a = np.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])

use np.modf, np.greater_equal, and np.add:

frac, whole = np.modf(a)
whole + (frac >= 0.5)

The Array API wants the output dtype to be the same as the input dtype, so it's fine that the result is floating-point. The new API also doesn't have a decimals argument (which might be why it's not a ufunc).

This function needs special handling if the inputs are complex—it must be split into real and imaginary parts, the above procedure must be applied to those parts, and then the result is recombined into complex numbers.

NumPy's old API and its implementation of the new API both seem to be making a mistake for complex numbers:

>>> import numpy as np
>>> import numpy.array_api as xp

>>> np.round(np.asarray([1.1+0.1j, 2.2+0.2j, 5.5+0.5j, 6.6+0.6j]))
array([1.+0.j, 2.+0.j, 6.+0.j, 7.+1.j])

>>> xp.round(xp.asarray([1.1+0.1j, 2.2+0.2j, 5.5+0.5j, 6.6+0.6j]))
Array([1.+0.j, 2.+0.j, 6.+0.j, 7.+1.j], dtype=complex128)

I think that 5.5+0.5j should round to 6+1j, not 6+0j. I'll report it.

jpivarski avatar Dec 30 '23 12:12 jpivarski

Nope:

For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc.

(The Array API documentation should specify things like this!)

My original formulation is also wrong for negative numbers.

jpivarski avatar Dec 30 '23 12:12 jpivarski

Got it: given a

>>> a = np.array([-2.5, -1.5, -0.6, -0.5, -0.4, 0.4, 0.5, 0.6, 1.5, 2.5])

>>> np.round(a)
array([-2., -2., -1., -0., -0.,  0.,  0.,  1.,  2.,  2.])

>>> frac, whole = np.modf(a)
>>> abs_frac = np.absolute(frac)
>>> whole + ((abs_frac == 0.5) * (whole % 2 != 0) + (abs_frac > 0.5)) * np.sign(frac)
array([-2., -2., -1., -0., -0.,  0.,  0.,  1.,  2.,  2.])

And then complex numbers are computed by applying this independently to the real and imaginary parts.

Edit: made another correction, so that this works for absolute-value fractions equal to 0.5 and not equal to 0.5.

jpivarski avatar Dec 30 '23 12:12 jpivarski

https://github.com/data-apis/array-api/issues/726

jpivarski avatar Dec 30 '23 12:12 jpivarski

For round, you can use np.around

agoose77 avatar Dec 30 '23 13:12 agoose77

Although np.around is not np.round, (despite the np.around documentation saying that it's an alias), it's also not a ufunc:

>>> np.around
<function around at 0x763385fef6b0>

>>> isinstance(np.round, np.ufunc)
False
>>> isinstance(np.around, np.ufunc)
False

and so Awkward would need the same NEP-18 overload for np.around as it does for np.round. I'll add that to the title.

jpivarski avatar Dec 30 '23 14:12 jpivarski

Ah now I remember - IIRC at one point I patched awkward to implement the round ufunc, but you could only overload around, not round. I don't know if that's still the case!

agoose77 avatar Dec 30 '23 16:12 agoose77

That's strange, since they're both not ufuncs. I would have thought that both of them would need to have a NEP-18 style overload (__array_function__).

jpivarski avatar Dec 30 '23 18:12 jpivarski

I added support for angle along with real and imag in the branch 2917-npreal-npimag-npround-nparound-cant-take-an-awkward-array-as-input. I have not implemented real_if_close yet, which has the same issues.

Looking into round and around. Indeed the documentation claims that round and around are aliases. What is our policy for supporting numpy functions? real and imag (and abs which is a ufunc) are part of the array API standard, while angle is not. round is, around is not.

round is related to ceil, fix, floor, rint, and trunc (according to the docstring for round). Of those, only round and fix are not ufuncs.

Should we support round, but not around or fix? Do we draw a line somewhere between numpy operations that we do and do not intend to support? IMO, of these functions, floor and ceil are generally useful while the others are probably quite niche.

tcawlfield avatar Mar 14 '24 20:03 tcawlfield

My view is that we can do this iteratively. It's definitely useful to have real, imag, and round. With ceil and floor, we then have a fairly comprehensive set of float manipulation routines. I'd be happy to stop there, and revisit if we get a feature report.

agoose77 avatar Mar 15 '24 10:03 agoose77