numexpr
numexpr copied to clipboard
sum reduction for boolean arrays
Trying to sum using boolean elements, result in an error
import numpy as np
import numexpr as ne
a = np.arange(10)
ne.evaluate('sum(a>5)')
KeyError: ('sum(a>5)', (('optimization', 'aggressive'), ('truediv', False)), (('a', <class 'numpy.int32'>),))
Using where seems cumbersome, but gets the job done.
ne.evaluate('sum(where(a>5,1,0))')
Would there any speed increase when the sum is implemented for boolean arrays?
There isn't any room left in the opcode space for a boolean sum in NumExpr 2.6 unfortunately. In general reductions in NumExpr 2.6 are single-threaded and don't show a speed advantage over numpy. The NumExpr 3 branch doesn't have reductions implemented yet but this would eventually be supported there.
I see, I probably shouldn't ask when NumExpr3 is ready?
The current reduction using where, saves me memory, which is scarce, and thus increase the speed of my computations. It is also results in a speed increase of 2x (without the memory limit).
Good luck with NumExpr 3!
The general plan is to first get NE3 into a better state with regards to broadcasting of arrays, then add string support, and then do reductions. I want to allow reductions at any point in the code so I need some more advanced tracking of the broadcasting through the program. I've applied for some funding, and I'll continue to look for more. Ideally this could be done by the end of the summer.
For non-complex evaluation expressions, a simple workaround for you might be using a numpy view.
import numpy as np
import numexpr as ne
a = np.random.random((10,20)) <= 0.2
# ne.evaluate('sum(a)') # NotImplementedError: couldn't find matching opcode for 'sum_bbn'
a_ = a.view(np.uint8)
ne.evaluate('sum(a_)') # OK
However, utilizing the sum function for boolean types in general might not be optimal.
Consider replacing sum with numpy's count_nonzero.
a = np.arange(1e6)
%timeit ne.evaluate('sum(where(a>500000,1,0))')
# 100 loops, best of 7: 4.18 ms per loop
a_ = (a > 500000).view(np.uint8)
%timeit (a > 500000).view(np.uint8)
# 100 loops, best of 7: 837 µs per loop
%timeit ne.evaluate('sum(a_)')
# 100 loops, best of 7: 2.49 ms per loop
%timeit np.sum(a>500000)
# 1000 loops, best of 7: 1.83 ms per loop
%timeit np.count_nonzero(a>500000)
# 1000 loops, best of 7: 946 µs per loop
That is a really neat trick using the views! However my original problem was that np.allclose was making several copies of my already large data, that could barely fit in memory. Using the reduction functions of Numexpr, I could prevent any large intermediates from being created. It is even slightly beneficial without running out of memory.
import numpy as np
import numexpr as ne
a = np.arange(1e6)
b = np.arange(1e6)
%timeit ne.evaluate('sum(where((a-b)>1e-10,1,0))') > 0
100 loops, best of 3: 8.92 ms per loop
%timeit np.allclose(a,b)
10 loops, best of 3: 59.9 ms per loop
%timeit np.any((a-b)>1e-10)
100 loops, best of 3: 13.5 ms per loop
The performance difference between np.any, and np.count_nonzero seems negligible.
So I was actually looking for an any function for boolean arrays, but this was out of scope for NumExpr2.
The sum variant was the next best thing I could come up with.
Would it be possible for NumExpr2 to use your trick allow the sum of boolean arrays, since uint8 and boolean arrays are essentially the same, regarding the reduction functions?
The
sumvariant was the next best thing I could come up with.
You could also use max in this case. But it seems to be slower.
Interesting. With my current setup, I get different benchmark results. Here, the np.any option is fastest.
# CPython v3.6.1, Win7, Intel-i7
import numpy as np # v1.12.1
import numexpr as ne # v2.6.2
a = np.arange(1e6)
b = np.arange(1e6)
%timeit ne.evaluate('sum(where((a - b) > 1e-10, 1, 0))') > 0
# yours: 8.92 ms per loop
# mine: 4.77 ms ± 69.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.allclose(a, b)
# yours: 59.9 ms per loop
# mine: 20.6 ms ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.any((a - b) > 1e-10)
# yours: 13.5 ms per loop
# mine: 4.1 ms ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In addition, 'sum(where(a > (b + 1e-10), 1, 0))' consistently is a little bit faster than 'sum(where((a - b) > 1e-10, 1, 0))' in my tests (still behind np.any).
Interesting. With my current setup, I get different benchmark results. Here, the np.any option is fastest.
I can confirm the same performance characteristics with a first gen i7 quadcore, whjle the first numbers where made on a Core2duo.
I made a copy mistake in my previous message, I forgot to take the absolute value of the error. The corrected code, again on the Core2duo (the i7 contains an identical software setup).
import sys;print("Python: ",sys.version)
import numpy as np;print("Numpy: ",np.__version__)
import numexpr as ne;print("Numexpr:",ne.__version__)
# Python: 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)]
# Numpy: 1.12.1
# Numexpr: 2.6.2
a = np.arange(1e6)
b = np.arange(1e6)
%timeit ne.evaluate('sum(where(abs(a - b) > 1e-10, 1, 0))') > 0
# 100 loops, best of 3: 10.3 ms per loop
# 100 loops, best of 3: 7.22 ms per loop (i7)
%timeit np.allclose(a, b)
# 10 loops, best of 3: 61.5 ms per loop
# 10 loops, best of 3: 26 ms per loop (i7)
%timeit np.any(abs(a - b) > 1e-10)
# 10 loops, best of 3: 22.4 ms per loop
# 100 loops, best of 3: 10 ms per loop (i7)
Again there is more performance gain in the Numpy version when running on the i7. Although as said before, most performance was gained by preventing big temporaries.
Message to comment on stale issues. If none provided, will not mark issues stale