numba
numba copied to clipboard
`raise` and `assert` do not work correctly in CUDA device functions
The following:
from numba import cuda
@cuda.jit(device=True)
def f():
raise ValueError('Error')
@cuda.jit(debug=True, opt=False)
def kernel():
f()
kernel[1, 1]()
results in an error like:
Traceback (most recent call last):
File "/home/gmarkall/numbadev/issues/dispatcher-refactor/raise_in_devfunc.py", line 14, in <module>
kernel[1, 1]()
File "/home/gmarkall/mambaforge/envs/numba0531/lib/python3.9/site-packages/numba/cuda/compiler.py", line 821, in __call__
return self.dispatcher.call(args, self.griddim, self.blockdim,
File "/home/gmarkall/mambaforge/envs/numba0531/lib/python3.9/site-packages/numba/cuda/compiler.py", line 966, in call
kernel.launch(args, griddim, blockdim, stream, sharedmem)
File "/home/gmarkall/mambaforge/envs/numba0531/lib/python3.9/site-packages/numba/cuda/compiler.py", line 716, in launch
exccls, exc_args, loc = self.call_helper.get_exception(code)
ValueError: not enough values to unpack (expected 3, got 2)
If I fix the MinimalCallHelper
so that reporting the error works correctly (get_exception()
should return three things, not two):
diff --git a/numba/core/callconv.py b/numba/core/callconv.py
index b3fa6cbbd..e99494524 100644
--- a/numba/core/callconv.py
+++ b/numba/core/callconv.py
@@ -315,7 +315,10 @@ class _MinimalCallHelper(object):
return self.exceptions[exc_id]
except KeyError:
msg = "unknown error %d in native function" % exc_id
- return SystemError, (msg,)
+ exc = SystemError
+ exc_args = (msg,)
+ locinfo = None
+ return exc, exc_args, locinfo
# The structure type constructed by PythonAPI.serialize_uncached()
# i.e a {i8* pickle_buf, i32 pickle_bufsz, i8* hash_buf}
then the exception becomes:
Traceback (most recent call last):
File "/home/gmarkall/numbadev/issues/dispatcher-refactor/raise_in_devfunc.py", line 14, in <module>
kernel[1, 1]()
File "/home/gmarkall/numbadev/numba/numba/cuda/dispatcher.py", line 473, in __call__
return self.dispatcher.call(args, self.griddim, self.blockdim,
File "/home/gmarkall/numbadev/numba/numba/cuda/dispatcher.py", line 581, in call
kernel.launch(args, griddim, blockdim, stream, sharedmem)
File "/home/gmarkall/numbadev/numba/numba/cuda/dispatcher.py", line 306, in launch
raise exccls(*exc_args)
SystemError: tid=[0, 0, 0] ctaid=[0, 0, 0]: unknown error 1 in native function
Thoughts:
- It seems that two
MinimalCallConv
objects, and therefore two_MinimalCallHelper
objects get created during compilation. - The details of the exception get added to one of these (through
return_user_exc()
/_add_exception()
), but when it's time to look up the exception (throughget_exception()
), the other one is used. -
BaseCallConv.init_call_helper()
, which callsself._make_call_helper()
to make the call helper, stashes the call helper it gets on the builder object: https://github.com/numba/numba/blob/0f5953df2a2ee6eea1ac44b974bcd954aae91bc6/numba/core/callconv.py#L105-L114 - It looks like the idea here is to somehow memoize the call helper on the builder so that it can be reused across multiple call conv objects (for example so that they can all have a common dict of exception info in the call helper) but as far as I can see the two call convs have two separate builders, so the isn't the possibility of sharing anyway.
- The next point of investigation is to see how the CPU target handles this case (one njit function calling another that raises an exception) - how does it ensure consistency of information about exceptions between the two functions?
I can relatively easily go back as far as Numba 0.48, in which this also didn't work - I get the impression that this has never worked.