stuartarchibald

Results 322 comments of stuartarchibald

> Buildfarm ID: `numba_smoketest_cuda_yaml_157`. Passed (on all non-CUDA-10.2 builds as expected since #8287)! After many months of effort to get to this point... `@overload` works for CUDA! Many thanks @gmarkall,...

Thanks for the report. I can reproduce: ```python from IPython import get_ipython ipython = get_ipython() from numba import njit import numpy as np @njit def sma1(x,d): n = np.ones(d) weights...

@thomasaarholt code is here if you want to investigate further https://github.com/numba/numba/blob/3b1e4abd1b82ab5dc60f912ad43faa6e26cd87ba/numba/np/arraymath.py#L4010-L4031 I suspect for larger data sets the algs start to push against hardware limits and so performance becomes asymptotic....

NumPy has a specialisation for "small" arrays to avoid calling the BLAS routines behind `np.dot`, were this specialisation added to the Numba implementation it'd probably shrink the current performance gap....

Removed in #8335.

> I think I'm ready to approve this, but I first want to confirm my understanding of the issue and patch is correct. I have resisted understanding entirely how the...

> Ah, that explanation makes the error message make way more sense. The only way I could rationalise it was that something inconsistent made a lot of confusion down the...

> @stuartarchibald I have fixed up the items from the review and tested the patch on a `ppc64le` machine. The Numba test suite now passes on that platform w/o linking...

> @stuartarchibald thank you for the review. I double checked [b17fe1a](https://github.com/numba/numba/commit/b17fe1a4d74ebb55c3d698e67b2b54d0529d4f8e) manually again on a power8 machine. I was able to compile Numba and the test suite ran to completion...