numexpr
numexpr copied to clipboard
Computing entropy
Hi,
I'm trying to compute the entropy of vectors in a 2d array using
ne.evaluate('sum(where(a>0, a * log(a), 0), axis=1)')
I expect it to be faster than Numpy's alternatives, since it computes the log
and performs the multiplication only where the value is positive. In addition, it should avoid intermediate arrays.
When running with no constraint on the number of threads, I do get better performance from numexpr. But when running with a single thread, which is my use case, I get better performance from Numpy.
This is a script for performing time measurements and comparing several alternatives:
import timeit
import numexpr as ne
import numpy as np
from scipy import special
ne.set_num_threads(1)
def simple(arg):
a, log_a = arg
return np.sum(a * np.log2(np.where(a > 0, a, 1), out=log_a), axis=1)
def xlogy(arg):
a, log_a = arg
a[a < 0] = 0
return np.sum(special.xlogy(a, a), axis=1) * (1/np.log(2))
def matmul(arg):
a, log_a = arg
log_a.fill(0)
np.log2(a, where=a > 0, out=log_a)
return (a[:, None, :] @ log_a[..., None]).ravel()
def numexpr(arg):
a, _ = arg
return ne.evaluate('sum(where(a>0, a * log(a), 0), axis=1)') * (1/np.log(2))
def setup():
a = np.random.rand(20, 1000) - 0.1
log = np.empty_like(a)
return a, log
setup_code = """
from __main__ import matmul, numexpr, simple, xlogy, setup
data = setup()
"""
func_code = "numexpr(data)"
print(timeit.timeit(func_code, setup=setup_code, number=100000))