numexpr icon indicating copy to clipboard operation
numexpr copied to clipboard

calculation of sum is slow and uses only one core

Open FrancescAlted opened this issue 11 years ago • 7 comments

From [email protected] on February 19, 2012 03:06:02

  1. a = numpy.random.random((10000,10000))
  2. numexpr.evaluate("sin(a) + exp(a) + log(a + 3)").sum() - fast, uses all cores
  3. numexpr.evaluate("sum(sin(a) + exp(a) + log(a + 3))") - slow, uses one core

I often use sum for expressions like sum(exp(a[:,None]*b[None,:])), where two vectors are passed to numexpr, and one number is an output. It would be great to avoid creation of an array a[:,None] * b[None,:] at all.

numexpr: 2.0.1, numpy:1.6.1, OS: ubuntu 11.04

Original issue: http://code.google.com/p/numexpr/issues/detail?id=73

FrancescAlted avatar Jan 22 '14 10:01 FrancescAlted

From [email protected] on March 03, 2012 23:16:39

I can reproduce this issue. It looks like sum (even on a single array) is not using multiple threads and anything inside the sum won't be accelerated.

What would be the best way to code this? I'll be happy to help.

Thanks.

FrancescAlted avatar Jan 22 '14 10:01 FrancescAlted

From [email protected] on July 31, 2012 08:59:38

+1 In a use case I am encountering, the numexpr.evaluate("sum(a)") version is takes over 60s to complete, uses only one core, BUT keeps the memory usage quite low. OTOH, the numexpr.evaluate("a").sum() version takes just a few seconds to complete, uses many cores, BUT uses as much as 15GB of memory, albeit only momentarily.

This is a very substantial defect which appears to affect multiple users.

FrancescAlted avatar Jan 22 '14 10:01 FrancescAlted

From [email protected] on July 31, 2012 11:05:10

I don't think default ndarray.sum() method is capable of using more than one core. The dirty workaround I use for myself now is to use parallel sum using OpenMP and weave.inline Here's the function...

def openmpSum(in_array): """ Performs fast sum of an array using openmm
""" from scipy import weave a = numpy.asarray(in_array) b = numpy.array([1.]) N = int(numpy.prod(a.shape)) code = r"""
int i=0; double sum = 0;
omp_set_num_threads(4); #pragma omp parallel for \
default(shared) private(i)
reduction(+:sum)
for (i=0; i<N; i++) sum += a[i]; b[0] = sum; """
weave.inline(code, ['a','N','b'], extra_compile_args=['-march=native -O3 -fopenmp ' ], support_code = r""" #include <stdio.h> #include <omp.h> #include <math.h>""", libraries=['gomp']) return b[0]

FrancescAlted avatar Jan 22 '14 10:01 FrancescAlted

These reports date to mid-2012 but I am seeing this issue with the latest version of numexpr. Has any progress been made in solving it? Otherwise this library is extremely nice, but this defect makes it impossible for me to use on my project.

apontzen avatar Sep 04 '14 19:09 apontzen

Right now numexpr is in pure maintenance mode. I typically still have time though to revise pull requests and merge them if appropriate, but not much more than this. So if this is something that you want to see in numexpr, you still could send a PR and I would revise it.

FrancescAlted avatar Sep 05 '14 09:09 FrancescAlted

Thanks for the clarification. I'll look into it and send a PR if I can.

apontzen avatar Sep 05 '14 09:09 apontzen

Thanks for keeping the issue opened as it is still present.

kif avatar Sep 28 '18 09:09 kif

Any news on this issue? evaluate('sum(X, 2)') with X being a 4d ndarray is slower than NumPy and only uses one core for me.

artpelling avatar Dec 06 '23 15:12 artpelling

Message to comment on stale issues. If none provided, will not mark issues stale

github-actions[bot] avatar Feb 23 '24 01:02 github-actions[bot]