ndarray-stats icon indicating copy to clipboard operation
ndarray-stats copied to clipboard

Work on performance issues in summary statistics due to using ArrayBase::sum

Open jturner314 opened this issue 6 years ago • 1 comments

The summary statistics methods use ArrayBase::sum (directly or indirectly) in anticipation of pairwise summation (rust-ndarray/ndarray#577), which provides improved accuracy over naive summation using fold. However, to do this, some of the methods have unnecessary allocations or other performance issues.

For example, harmonic_mean is implemented like this:

self.map(|x| x.recip()).mean().map(|x| x.recip())

It's implemented this way to take advantage of .mean() (which is implemented in terms of .sum()), but this approach requires a temporary allocation for the result of self.map.

summary_statistics::means::moments has a similar issue:

for k in 2..=order {
    moments.push(a.map(|x| x.powi(k)).sum() / n_elements)
}

It's implemented this way to take advantage of .sum(). However, this implementation requires a temporary allocation for the result of a.map. Additionally, it would probably be faster to make the loop over k be the innermost loop to improve the locality of reference.

We should be able to resolve these issues with a lazy version of map combined with a pairwise summation method on that lazy map. Something like jturner314/nditer would work once it's stable.

[Edit: This issue also appears in the entropy methods.]

jturner314 avatar Mar 31 '19 20:03 jturner314

I'm not sure whether this helps, but the average crate provides constant-memory algorithms to calculate various statistics.

vks avatar Apr 08 '19 17:04 vks