stream-lib icon indicating copy to clipboard operation
stream-lib copied to clipboard

CountMinSketch overflow check causes a lot of allocations

Open alexandrnikitin opened this issue 7 years ago • 4 comments

It starts from String.valueOf(item) here https://github.com/addthis/stream-lib/blob/586814173573254d13a81fdb6f60b3207605c747/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java#L205 String concatenation here https://github.com/addthis/stream-lib/blob/586814173573254d13a81fdb6f60b3207605c747/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java#L189 and here https://github.com/addthis/stream-lib/blob/586814173573254d13a81fdb6f60b3207605c747/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java#L289 Could be related to #140

alexandrnikitin avatar Aug 22 '17 06:08 alexandrnikitin

I'm not sure. Stream-lib's CountMinSketch has fallen behind what @tdunning has in his repo. Hopefully we can work on pulling in a more recent version. Otherwise we may need to deprecate this class.

abramsm avatar Aug 23 '17 15:08 abramsm

I don't think I have a CMS implementation.

I have t-digest and FloatHistogram. Just had a release last week and some big stuff coming soon beyond that.

tdunning avatar Aug 24 '17 01:08 tdunning

Sorry @tdunning. My memory seems to be failing me... Great news on new releases of T-digest and FloatHistogram.

@alexandrnikitin I'll look into this and #140 as soon as I can.

abramsm avatar Aug 24 '17 15:08 abramsm

@abramsm Thank you. I'm ready to help with it. Basically I would just remove the overflow check. I can also introduce microbenchmarks to prevent regression. I just need an answer for #140 whether there's something I miss or not? I see it only as an optimization for fast comparison.

alexandrnikitin avatar Aug 24 '17 15:08 alexandrnikitin