awkward icon indicating copy to clipboard operation
awkward copied to clipboard

Support nb.vectorize without type annotations

Open tamasgal opened this issue 3 years ago • 10 comments

Description of new feature

This was bugging me for an hour or so, until I realised that it has to do with the order of tests executed ;)

I have a function which I need to make compatible with Numba so that it can accept single values (Integers) or an array of integers. Luckily there is @nb.vectorize since type checking (e.g. via isinstance) is not possible inside Numba-JITted functions and doing it outside is not an option either since I want this function to be generally compatible in a Numba context.

Long story short, the MWE is pretty simple:

import numba as nb
import numpy as np
import awkward as ak

@nb.vectorize
def f(x):
    return x

Which will fail with

>>> f(ak.Array([1,2,3]))
...
...
...
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /var/folders/84/mcvklq757tq1nfrkbxvvbq8m0000gn/T/ipykernel_92463/551455230.py (3)

File "../../../../var/folders/84/mcvklq757tq1nfrkbxvvbq8m0000gn/T/ipykernel_92463/551455230.py", line 3:
<source missing, REPL/exec in use?>

but successfully compile and work with

>>> f(np.array([1,2,3]))

and eventually (once it ran with a np.array instance) work with Awkward

>>> f(ak.Array([1,2,3]))

This means that only the type inference stage in Numba failed, however, once it was able to figure it out with a Numpy array, it understood that the Awkward array has the same type and run the correct compiled (and cached) function.

If I call f again with a yet unseen Awkward type:

>>> f(ak.values_astype(ak.Array([1,2,3]), "float"))

I get the same inference error, which can be again fixed by calling it with a similar Numpy array type:

>>> f(np.array([1,2,3], dtype="float"))

My current workaround now is a call with a Numpy array for the expected types right after the function definition.

tamasgal avatar Nov 04 '21 14:11 tamasgal

If f(ak.Array([1,2,3])) didn't work at first (with "non-precise type pyobject" specifically) and then worked later, I suspect that the problem is that ak._connect._numba.register() is not getting called. This is what triggers the definition of all the Numba handlers for Awkward Arrays, so that it recognizes them as something other than vanilla Python objects.

This ak._connect._numba.register() is supposed to be called by Numba when Numba starts up with import numba. This happens through a Python setuptools entry point that Numba has provided: when import numba is invoked, it runs all of the entry points that other packages, such as Awkward Array, have registered.

I don't understand all the aspects of this mechanism, but it depends on the global state of the directories that pip and Python look at. There have been times when one version of awkward didn't uninstall cleanly on my development laptop and I've gotten this error. It's not a pure function of the Python library—it depends on this extra information that pip sets up. If you're running awkward out of a local directory and haven't pip-installed it, for instance, it would always fail in this way. The way that I've fixed it in the past has been to pip uninstall awkward until there are no versions of it left and then do one clean pip install of the desired version.

I don't know whether you're seeing this error in your development environment or in continuous testing—I'd be more surprised about seeing it in continuous testing because that's usually a fresh install and can't be as easily broken. If it's in a development environment, it's more likely to affect intensive users than casual users because of all the updates that we do. I don't know if using venv strictly would make this 100% clean.

In any event, a much better work-around if you just want to work around it is

ak._connect._numba.register()

If calling that before any Numba compilation of Awkward Arrays (i.e. concrete types, triggering an actual JIT) doesn't solve it, then something deeper is wrong.

jpivarski avatar Nov 04 '21 15:11 jpivarski

Thanks Jim for the quick answer and detailed explanation.

Unfortunately, calling ak._connect._numba.register() right after import awkward as ak does not help (import numba was done before importing awkward), it gives the same error and works after calling with a numpy array.

OK, so the actual problem is then maybe related to pip since on my setup, Awkward is installed via Conda. I have not fully understood yet how the entry point hack works but I suspect that the Conda forge recipe is not picking it up (this is just a wild guess, I have really little idea about the mechanics). The reason I use conda is because on my current working machine (M1 MacBook), numpy and numba are not pip installable.

I just checked in one of our batch farms (Lyon CC IN2P3) where everything is installed via pip/venv and the behaviour is similar, so maybe my pip theory is flawed ;)

>>> import numba as nb
... import numpy as np
... import awkward as ak
...
... @nb.vectorize
... def f(x):
...     return x
...

>>> f(ak.Array([1,2,3]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-7e4d70666b5e> in <module>
----> 1 f(ak.Array([1,2,3]))

/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_args(self, *args, **kws)
    177         argtys = []
    178         for arg in args[:nin]:
--> 179             argty = typeof(arg)
    180             if isinstance(argty, types.Array):
    181                 argty = argty.dtype

/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/typing/typeof.py in typeof(val, purpose)
     31         msg = _termcolor.errmsg(
     32             "cannot determine Numba type of %r") % (type(val),)
---> 33         raise ValueError(msg)
     34     return ty
     35

ValueError: cannot determine Numba type of <class 'awkward.highlevel.Array'>

>>> f(np.array([1,2,3]))
array([1, 2, 3])

>>> f(ak.Array([1,2,3]))
<Array [1, 2, 3] type='3 * int64'>

>>> ak.__version__
'1.5.1'

>>> nb.__version__
'0.50.1'

tamasgal avatar Nov 04 '21 15:11 tamasgal

Well, ak._connect._numba.register() does exactly what the entry point does, so if calling that registration function explicitly doesn't work as a work-around, then it's indicating that something else is the real problem.

There was some minimum Numba version that supported the entry point—I submitted a bug report about it being called too late and that was fixed in some version. I'm using Numba version 0.54.1 right now (the latest from conda-forge). Can you try that? 0.50 sounds close to the threshold when some of these things were fixed.

jpivarski avatar Nov 04 '21 16:11 jpivarski

On my M1 Mac, I have 0.54.1:

In [1]: import numba as nb
n
In [2]: nb.__version__
Out[2]: '0.54.1'

I updated Numba in our computing centre (CentOS 7) to 0.54.1 but still the same error.

I noticed however that when I do not do the ak._connect._numba.register(), I get this:

>>> f(ak.Array([1,2,3]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-7e4d70666b5e> in <module>
----> 1 f(ak.Array([1,2,3]))

/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_args(self, *args, **kws)
    186         argtys = []
    187         for arg in args[:nin]:
--> 188             argty = typeof(arg)
    189             if isinstance(argty, types.Array):
    190                 argty = argty.dtype

/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/typing/typeof.py in typeof(val, purpose)
     33         msg = _termcolor.errmsg(
     34             f"Cannot determine Numba type of {type(val)}")
---> 35         raise ValueError(msg)
     36     return ty
     37

ValueError: Cannot determine Numba type of <class 'awkward.highlevel.Array'>

and after calling the register function, I get this:

>>> ak._connect._numba.register()

>>> f(ak.Array([1,2,3]))
<ipython-input-4-a13cf829227f>:1: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "f" failed type inference due to: non-precise type pyobject
During: typing of argument at <ipython-input-4-a13cf829227f> (3)

File "<ipython-input-4-a13cf829227f>", line 3:
def f(x):
    return x
    ^

  @nb.vectorize
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "f" was compiled in object mode without forceobj=True.

File "<ipython-input-4-a13cf829227f>", line 2:
@nb.vectorize
def f(x):
^

  state.func_ir.loc))
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "<ipython-input-4-a13cf829227f>", line 2:
@nb.vectorize
def f(x):
^

  state.func_ir.loc))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-7e4d70666b5e> in <module>
----> 1 f(ak.Array([1,2,3]))

/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_args(self, *args, **kws)
    200                 argty = numpy_support.map_arrayscalar_type(arg)
    201             argtys.append(argty)
--> 202         return self._compile_for_argtys(tuple(argtys))
    203
    204     def _compile_for_argtys(self, argtys, return_type=None):

/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_argtys(self, argtys, return_type)
    220             self._dispatcher, self.targetoptions, sig)
    221         actual_sig = ufuncbuilder._finalize_ufunc_signature(
--> 222             cres, argtys, return_type)
    223         dtypenums, ptr, env = ufuncbuilder._build_element_wise_ufunc_wrapper(
    224             cres, actual_sig)

/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/ufuncbuilder.py in _finalize_ufunc_signature(cres, args, return_type)
    183         if cres.objectmode:
    184             # Object mode is used and return type is not specified
--> 185             raise TypeError("return type must be specified for object mode")
    186         else:
    187             return_type = cres.signature.return_type

TypeError: return type must be specified for object mode

>>> import numpy as np

>>> f(np.array([1,2,3]))
array([1, 2, 3])

>>> f(ak.Array([1,2,3]))
<Array [1, 2, 3] type='3 * int64'>

tamasgal avatar Nov 04 '21 18:11 tamasgal

Huh, so ak._connect._numba.register() is doing what it's supposed to: Numba wasn't recognizing ak.Array before it was called, and has a different error afterward. (Mystery number 1: why aren't the entry points working?)

For mystery number 2: does nb.njit work but nb.vectorize not work? I haven't done much testing with nb.vectorize.

jpivarski avatar Nov 04 '21 18:11 jpivarski

Yes, I confirm that @nb.njit works. So it's related to nb.vectorize. My initial thought was that some introspection in the type inference chokes on something in Awkward. nb.vectorize will presumable inspect element types etc. but I am really not that comfortable with Numba intrinsics.

tamasgal avatar Nov 04 '21 18:11 tamasgal

Actually, @nb.vectorize must be expecting the arguments to be arrays, so that it can compile in a loop over those arrays. Awkward Arrays are not recognized in Numba as ArrayLike, but only as Iterable. (ArrayLike would require us to produce a shape and dtype, which would only be possible for rectilinear ones, and that's value information, not type information.)

I've used @vectorize to make ufuncs that Awkward Array has then caught and used like a NumPy ufunc, which is a very different code path. To do that, however, the types have to be given in the @vectorize decorator so that it can be compiled: only compiled, ready-to-run ufuncs satisfy NEP13 (call ak.Array.__array_ufunc__).

Does it work if you give @vectorize type info? It would be the data types of an element, so the types might be [nb.float64, nb.int32], for instance.

jpivarski avatar Nov 04 '21 19:11 jpivarski

Yes! With e.g. @nb.vectorize("int64(int64)") it works, but I have to specify all cases. I will have a look if this works for the actual use-case.

tamasgal avatar Nov 04 '21 19:11 tamasgal

Alright, I can live with explicit type annotations, at least I managed to cover all the expected cases.

Do you want to close this for now or are you going to look deeper?

tamasgal avatar Nov 05 '21 08:11 tamasgal

We still don't know why registration didn't happen automatically, but that's an installation thing—I don't think the problem is inside the codebase.

As for explicit annotations in vectorize, that's something I knew about but didn't connect to your case right away. This is labeled as a feature request—I'll change the title to make it more explicit. It might require a change on Numba's side (e.g. have uncompelled vectorized functions check for a __array_ufunc__ method and pass itself to that, optimistically).

jpivarski avatar Nov 05 '21 12:11 jpivarski

I addressed this in Dask (numba/numba#8995), which has been released in Numba 0.59.0dev0, 0.58.1, 0.58.0, 0.58.0rc2, and 0.58.0rc1.

jpivarski avatar Dec 08 '23 16:12 jpivarski

It's good now!

>>> @nb.vectorize
... def f(x):
...     return x
... 
>>> f(ak.Array([1,2,3]))
<Array [1, 2, 3] type='3 * int64'>

>>> @nb.vectorize
... def f(x):
...     return x**2
... 
>>> f(ak.Array([1,2,3]))
<Array [1, 4, 9] type='3 * int64'>

jpivarski avatar Jan 20 '24 00:01 jpivarski