pnumpy icon indicating copy to clipboard operation
pnumpy copied to clipboard

need to route ufunc signatures we do not have hook for into threader

Open tdimitri opened this issue 5 years ago • 1 comments

we can thread ufuncs we do not understand. For a binary_reduce, on a large array, we can divide the work up assigning each work chunk to a thread. each work item would output to a slot in another output array (allocated on the fly). then that output array can be sent back to the binary_reduce loop for the final calculation (example would be each thread calculates the sum, then the final calculation does the sum of sums)

For non binary reduce on large arrays, we can divide up the work as normal (for both binary and unary ufuncs).

tdimitri avatar Sep 29 '20 13:09 tdimitri

we can thread ufuncs we do not understand.

We can reuse the pointer we pull out of PyUFunc_ReplaceLoopBySignature and plug it back into our loop override. This has the advantage of using the SSE/AVX optimized loop without needing the CPU detection since NumPy already uses AVX for many ufuncs.

mattip avatar Sep 29 '20 13:09 mattip