awkward
awkward copied to clipboard
Support nb.vectorize without type annotations
Description of new feature
This was bugging me for an hour or so, until I realised that it has to do with the order of tests executed ;)
I have a function which I need to make compatible with Numba so that it can accept single values (Integers) or an array of integers. Luckily there is @nb.vectorize
since type checking (e.g. via isinstance
) is not possible inside Numba-JITted functions and doing it outside is not an option either since I want this function to be generally compatible in a Numba context.
Long story short, the MWE is pretty simple:
import numba as nb
import numpy as np
import awkward as ak
@nb.vectorize
def f(x):
return x
Which will fail with
>>> f(ak.Array([1,2,3]))
...
...
...
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /var/folders/84/mcvklq757tq1nfrkbxvvbq8m0000gn/T/ipykernel_92463/551455230.py (3)
File "../../../../var/folders/84/mcvklq757tq1nfrkbxvvbq8m0000gn/T/ipykernel_92463/551455230.py", line 3:
<source missing, REPL/exec in use?>
but successfully compile and work with
>>> f(np.array([1,2,3]))
and eventually (once it ran with a np.array
instance) work with Awkward
>>> f(ak.Array([1,2,3]))
This means that only the type inference stage in Numba failed, however, once it was able to figure it out with a Numpy array, it understood that the Awkward array has the same type and run the correct compiled (and cached) function.
If I call f
again with a yet unseen Awkward type:
>>> f(ak.values_astype(ak.Array([1,2,3]), "float"))
I get the same inference error, which can be again fixed by calling it with a similar Numpy array type:
>>> f(np.array([1,2,3], dtype="float"))
My current workaround now is a call with a Numpy array for the expected types right after the function definition.
If f(ak.Array([1,2,3]))
didn't work at first (with "non-precise type pyobject" specifically) and then worked later, I suspect that the problem is that ak._connect._numba.register()
is not getting called. This is what triggers the definition of all the Numba handlers for Awkward Arrays, so that it recognizes them as something other than vanilla Python objects.
This ak._connect._numba.register()
is supposed to be called by Numba when Numba starts up with import numba
. This happens through a Python setuptools entry point that Numba has provided: when import numba
is invoked, it runs all of the entry points that other packages, such as Awkward Array, have registered.
I don't understand all the aspects of this mechanism, but it depends on the global state of the directories that pip and Python look at. There have been times when one version of awkward
didn't uninstall cleanly on my development laptop and I've gotten this error. It's not a pure function of the Python library—it depends on this extra information that pip sets up. If you're running awkward
out of a local directory and haven't pip-installed it, for instance, it would always fail in this way. The way that I've fixed it in the past has been to pip uninstall awkward
until there are no versions of it left and then do one clean pip install
of the desired version.
I don't know whether you're seeing this error in your development environment or in continuous testing—I'd be more surprised about seeing it in continuous testing because that's usually a fresh install and can't be as easily broken. If it's in a development environment, it's more likely to affect intensive users than casual users because of all the updates that we do. I don't know if using venv strictly would make this 100% clean.
In any event, a much better work-around if you just want to work around it is
ak._connect._numba.register()
If calling that before any Numba compilation of Awkward Arrays (i.e. concrete types, triggering an actual JIT) doesn't solve it, then something deeper is wrong.
Thanks Jim for the quick answer and detailed explanation.
Unfortunately, calling ak._connect._numba.register()
right after import awkward as ak
does not help (import numba
was done before importing awkward
), it gives the same error and works after calling with a numpy
array.
OK, so the actual problem is then maybe related to pip
since on my setup, Awkward is installed via Conda. I have not fully understood yet how the entry point hack works but I suspect that the Conda forge recipe is not picking it up (this is just a wild guess, I have really little idea about the mechanics). The reason I use conda
is because on my current working machine (M1 MacBook), numpy
and numba
are not pip install
able.
I just checked in one of our batch farms (Lyon CC IN2P3) where everything is installed via pip/venv
and the behaviour is similar, so maybe my pip
theory is flawed ;)
>>> import numba as nb
... import numpy as np
... import awkward as ak
...
... @nb.vectorize
... def f(x):
... return x
...
>>> f(ak.Array([1,2,3]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-7e4d70666b5e> in <module>
----> 1 f(ak.Array([1,2,3]))
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_args(self, *args, **kws)
177 argtys = []
178 for arg in args[:nin]:
--> 179 argty = typeof(arg)
180 if isinstance(argty, types.Array):
181 argty = argty.dtype
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/typing/typeof.py in typeof(val, purpose)
31 msg = _termcolor.errmsg(
32 "cannot determine Numba type of %r") % (type(val),)
---> 33 raise ValueError(msg)
34 return ty
35
ValueError: cannot determine Numba type of <class 'awkward.highlevel.Array'>
>>> f(np.array([1,2,3]))
array([1, 2, 3])
>>> f(ak.Array([1,2,3]))
<Array [1, 2, 3] type='3 * int64'>
>>> ak.__version__
'1.5.1'
>>> nb.__version__
'0.50.1'
Well, ak._connect._numba.register()
does exactly what the entry point does, so if calling that registration function explicitly doesn't work as a work-around, then it's indicating that something else is the real problem.
There was some minimum Numba version that supported the entry point—I submitted a bug report about it being called too late and that was fixed in some version. I'm using Numba version 0.54.1 right now (the latest from conda-forge). Can you try that? 0.50 sounds close to the threshold when some of these things were fixed.
On my M1 Mac, I have 0.54.1
:
In [1]: import numba as nb
n
In [2]: nb.__version__
Out[2]: '0.54.1'
I updated Numba in our computing centre (CentOS 7) to 0.54.1
but still the same error.
I noticed however that when I do not do the ak._connect._numba.register()
, I get this:
>>> f(ak.Array([1,2,3]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-7e4d70666b5e> in <module>
----> 1 f(ak.Array([1,2,3]))
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_args(self, *args, **kws)
186 argtys = []
187 for arg in args[:nin]:
--> 188 argty = typeof(arg)
189 if isinstance(argty, types.Array):
190 argty = argty.dtype
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/typing/typeof.py in typeof(val, purpose)
33 msg = _termcolor.errmsg(
34 f"Cannot determine Numba type of {type(val)}")
---> 35 raise ValueError(msg)
36 return ty
37
ValueError: Cannot determine Numba type of <class 'awkward.highlevel.Array'>
and after calling the register function, I get this:
>>> ak._connect._numba.register()
>>> f(ak.Array([1,2,3]))
<ipython-input-4-a13cf829227f>:1: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "f" failed type inference due to: non-precise type pyobject
During: typing of argument at <ipython-input-4-a13cf829227f> (3)
File "<ipython-input-4-a13cf829227f>", line 3:
def f(x):
return x
^
@nb.vectorize
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/object_mode_passes.py:152: NumbaWarning: Function "f" was compiled in object mode without forceobj=True.
File "<ipython-input-4-a13cf829227f>", line 2:
@nb.vectorize
def f(x):
^
state.func_ir.loc))
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/core/object_mode_passes.py:162: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.
For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
File "<ipython-input-4-a13cf829227f>", line 2:
@nb.vectorize
def f(x):
^
state.func_ir.loc))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-7e4d70666b5e> in <module>
----> 1 f(ak.Array([1,2,3]))
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_args(self, *args, **kws)
200 argty = numpy_support.map_arrayscalar_type(arg)
201 argtys.append(argty)
--> 202 return self._compile_for_argtys(tuple(argtys))
203
204 def _compile_for_argtys(self, argtys, return_type=None):
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/dufunc.py in _compile_for_argtys(self, argtys, return_type)
220 self._dispatcher, self.targetoptions, sig)
221 actual_sig = ufuncbuilder._finalize_ufunc_signature(
--> 222 cres, argtys, return_type)
223 dtypenums, ptr, env = ufuncbuilder._build_element_wise_ufunc_wrapper(
224 cres, actual_sig)
/pbs/throng/km3net/software/python/3.7.5/lib/python3.7/site-packages/numba/np/ufunc/ufuncbuilder.py in _finalize_ufunc_signature(cres, args, return_type)
183 if cres.objectmode:
184 # Object mode is used and return type is not specified
--> 185 raise TypeError("return type must be specified for object mode")
186 else:
187 return_type = cres.signature.return_type
TypeError: return type must be specified for object mode
>>> import numpy as np
>>> f(np.array([1,2,3]))
array([1, 2, 3])
>>> f(ak.Array([1,2,3]))
<Array [1, 2, 3] type='3 * int64'>
Huh, so ak._connect._numba.register()
is doing what it's supposed to: Numba wasn't recognizing ak.Array
before it was called, and has a different error afterward. (Mystery number 1: why aren't the entry points working?)
For mystery number 2: does nb.njit
work but nb.vectorize
not work? I haven't done much testing with nb.vectorize
.
Yes, I confirm that @nb.njit
works. So it's related to nb.vectorize
. My initial thought was that some introspection in the type inference chokes on something in Awkward. nb.vectorize
will presumable inspect element types etc. but I am really not that comfortable with Numba intrinsics.
Actually, @nb.vectorize
must be expecting the arguments to be arrays, so that it can compile in a loop over those arrays. Awkward Arrays are not recognized in Numba as ArrayLike, but only as Iterable. (ArrayLike would require us to produce a shape
and dtype
, which would only be possible for rectilinear ones, and that's value information, not type information.)
I've used @vectorize
to make ufuncs that Awkward Array has then caught and used like a NumPy ufunc, which is a very different code path. To do that, however, the types have to be given in the @vectorize
decorator so that it can be compiled: only compiled, ready-to-run ufuncs satisfy NEP13 (call ak.Array.__array_ufunc__
).
Does it work if you give @vectorize
type info? It would be the data types of an element, so the types might be [nb.float64, nb.int32]
, for instance.
Yes! With e.g. @nb.vectorize("int64(int64)")
it works, but I have to specify all cases. I will have a look if this works for the actual use-case.
Alright, I can live with explicit type annotations, at least I managed to cover all the expected cases.
Do you want to close this for now or are you going to look deeper?
We still don't know why registration didn't happen automatically, but that's an installation thing—I don't think the problem is inside the codebase.
As for explicit annotations in vectorize, that's something I knew about but didn't connect to your case right away. This is labeled as a feature request—I'll change the title to make it more explicit. It might require a change on Numba's side (e.g. have uncompelled vectorized functions check for a __array_ufunc__
method and pass itself to that, optimistically).
I addressed this in Dask (numba/numba#8995), which has been released in Numba 0.59.0dev0, 0.58.1, 0.58.0, 0.58.0rc2, and 0.58.0rc1.
It's good now!
>>> @nb.vectorize
... def f(x):
... return x
...
>>> f(ak.Array([1,2,3]))
<Array [1, 2, 3] type='3 * int64'>
>>> @nb.vectorize
... def f(x):
... return x**2
...
>>> f(ak.Array([1,2,3]))
<Array [1, 4, 9] type='3 * int64'>