numexpr icon indicating copy to clipboard operation
numexpr copied to clipboard

Computing entropy

Open assaft opened this issue 1 year ago • 0 comments

Hi,

I'm trying to compute the entropy of vectors in a 2d array using ne.evaluate('sum(where(a>0, a * log(a), 0), axis=1)')

I expect it to be faster than Numpy's alternatives, since it computes the log and performs the multiplication only where the value is positive. In addition, it should avoid intermediate arrays.

When running with no constraint on the number of threads, I do get better performance from numexpr. But when running with a single thread, which is my use case, I get better performance from Numpy.

This is a script for performing time measurements and comparing several alternatives:

import timeit
import numexpr as ne
import numpy as np
from scipy import special


ne.set_num_threads(1)


def simple(arg):
    a, log_a = arg
    return np.sum(a * np.log2(np.where(a > 0, a, 1), out=log_a), axis=1)


def xlogy(arg):
    a, log_a = arg
    a[a < 0] = 0
    return np.sum(special.xlogy(a, a), axis=1) * (1/np.log(2))


def matmul(arg):
    a, log_a = arg
    log_a.fill(0)
    np.log2(a, where=a > 0, out=log_a)
    return (a[:, None, :] @ log_a[..., None]).ravel()


def numexpr(arg):
    a, _ = arg
    return ne.evaluate('sum(where(a>0, a * log(a), 0), axis=1)') * (1/np.log(2))


def setup():
    a = np.random.rand(20, 1000) - 0.1
    log = np.empty_like(a)
    return a, log


setup_code = """
from __main__ import matmul, numexpr, simple, xlogy, setup
data = setup() 
"""
func_code = "numexpr(data)"

print(timeit.timeit(func_code, setup=setup_code, number=100000))

assaft avatar Jul 20 '22 08:07 assaft