evo
evo copied to clipboard
Faster stats package
Hi,
I love your project!!
I changed the stats package to a faster single-pass calculation. sumsq
is now truly a sum of squares and mean
was replaced with sum
. This allows for a much faster Put
method which is where most time is spent. The determination of min
and max
is faster too as there is not overhead call to functions in the math
package.
Unit tests pass as they are. A local benchmark showed original package of ~18ns/op and new package is ~8ns/op. Every bit helps, especially in genetic algorithms.
Thanks for the contribution! I'm excited to see real users out there.
IIUC this patch is using the naive algorithm for computing variance.
The original algorithm used here is Welford's algorithm for Put and Chan et al.'s algorithm for Merge.
These are numerically stable algorithms, which minimize rounding error. The idea is that the sum can be very large w.r.t. the other values, and error creeps in when performing computation on values at different scales. We can eliminate this by never using the sum directly.
That said, numerical stability may not be that important. The 2.25x speedup just from switching to the naive algorithm may be worth it for many user cases.
So there are trade offs. I am not sure if I want to straight up replace a numerically stable implementation with an unstable-but-fast implementation. I will need to think about it more deeply.
Case studies or macro benchmarks would help sway me either way.
Also, the contributors file is unnecessary. Git tracks that metadata automatically.
By changing the method signatures from Stats
to *Stats
, you are changing the semantics.
The original version does not modify the stats object.
It may be worth it to change, but it is backwards incompatible.
(Not that this library is likely to have any real production users.)