algebird icon indicating copy to clipboard operation
algebird copied to clipboard

SummingCache should use sumOption

Open avibryant opened this issue 10 years ago • 2 comments

SummingCache could buffer up multiple values for a key and sum them in one pass before flush using sumOption; this would trade memory use for performance.

If we standardized a pattern for mutable updates like HLL's updateInto, this could improve both performance and memory use.

avibryant avatar Mar 20 '14 17:03 avibryant

I've wondered a little about this, it currently is optimized for a reasonably low cache hit rate. If we get two values for a key we should merge right then than making that a list. (This path comes up often in summingbird). An alternate would be to on flush combine into a single Map[K, List[V]], and then sumOption the V's for each K. It would be interesting to see the garbage/required hit rate/sum option benefits for this to be a particularly good win.

ianoc avatar Mar 20 '14 17:03 ianoc

Yeah, buffering up lists may not actually make any sense, but a mutable buffer pattern like HLL's updateInto is an obvious win (and we can have a default implementation of that which just does the standard plus() into some box).

avibryant avatar Mar 20 '14 17:03 avibryant