Graham Markall

Results 455 comments of Graham Markall

> The warnings show up in a non-deterministic fashion. If I re-run the above example, they might show up at different places or, sometimes, not at all. The warnings go...

> The cuda version of this code does some weird stuff (see section between arrays of size `10**3` to `10**4` plus change) and seems to under-perform on sufficiently large datasets...

> What I do not quite understand is the section labeled as "A". The run-time literally plateaus. Is it related to the "complexity of the kernel" you mentioned earlier? I...

Thanks for the reproducer - I can reproduce this issue, and I'm looking into it.

The culprit is 0401550bc7a9e98add3e7d7728c193943d77d75c from PR #7999.

I have just tested the reproducer with https://github.com/numba/numba/pull/8271 - I had hoped it would fix this but it seems not to.

The following also does not help, neither with nor without #8271: ```diff diff --git a/numba/np/arrayobj.py b/numba/np/arrayobj.py index 7be350ba0..83bbc0579 100644 --- a/numba/np/arrayobj.py +++ b/numba/np/arrayobj.py @@ -2617,7 +2617,7 @@ def array_flags_f_contiguous(context, builder,...

One hypothesis: this is happening because the lowering for atomic addition is using the low-level API, which can't see overloads written with the high-level API (as `.real` and `.imag` now...

OK, it's not an issue with typing / resolving the function. The overload implementation is a function that returns an array, which is not supported by the CUDA calling convention:...