evo icon indicating copy to clipboard operation
evo copied to clipboard

Faster stats package

Open bliksemstraal opened this issue 3 years ago • 3 comments

Hi,

I love your project!!

I changed the stats package to a faster single-pass calculation. sumsq is now truly a sum of squares and mean was replaced with sum. This allows for a much faster Put method which is where most time is spent. The determination of min and max is faster too as there is not overhead call to functions in the math package.

Unit tests pass as they are. A local benchmark showed original package of ~18ns/op and new package is ~8ns/op. Every bit helps, especially in genetic algorithms.

bliksemstraal avatar Jan 06 '21 08:01 bliksemstraal

Thanks for the contribution! I'm excited to see real users out there.

IIUC this patch is using the naive algorithm for computing variance.

The original algorithm used here is Welford's algorithm for Put and Chan et al.'s algorithm for Merge.

These are numerically stable algorithms, which minimize rounding error. The idea is that the sum can be very large w.r.t. the other values, and error creeps in when performing computation on values at different scales. We can eliminate this by never using the sum directly.

That said, numerical stability may not be that important. The 2.25x speedup just from switching to the naive algorithm may be worth it for many user cases.

So there are trade offs. I am not sure if I want to straight up replace a numerically stable implementation with an unstable-but-fast implementation. I will need to think about it more deeply.

Case studies or macro benchmarks would help sway me either way.

cbarrick avatar Jan 09 '21 05:01 cbarrick

Also, the contributors file is unnecessary. Git tracks that metadata automatically.

cbarrick avatar Jan 09 '21 17:01 cbarrick

By changing the method signatures from Stats to *Stats, you are changing the semantics.

The original version does not modify the stats object.

It may be worth it to change, but it is backwards incompatible.

(Not that this library is likely to have any real production users.)

cbarrick avatar Jan 09 '21 17:01 cbarrick