stream-lib
stream-lib copied to clipboard
Stream summarizer and cardinality estimator.
RegisterSet class set and updateIfGreater method update value, but have not check weather the value greater than 0x1f. may be that is a bug.
- I have implemented Recordinality, a stream algorithm. Recordinality class extends from ICardinality and Serialize. You should check the serialize process and the merge (return an exeption because Recordinality doesn't...
Hi guys, I was wondering if the fact that `QDigest` does not implement neither `Serializable` nor `Externalizable` is intentional? I'm afraid it prevents `QDigest` from being serialized using `ObjectOutputStream`, which...
Sometimes when I merged two tdigests by call the add method of one tdigest, it threw NullPointerException as follows: java.lang.NullPointerException at com.clearspring.analytics.stream.quantile.GroupTree.add(GroupTree.java:85) at com.clearspring.analytics.stream.quantile.GroupTree.add(GroupTree.java:79) at com.clearspring.analytics.stream.quantile.GroupTree.add(GroupTree.java:79) at com.clearspring.analytics.stream.quantile.GroupTree.add(GroupTree.java:79) at com.clearspring.analytics.stream.quantile.GroupTree.add(GroupTree.java:79)...
The width of the sketch [according to the paper](http://www.cse.unsw.edu.au/~cs9314/07s1/lectures/Lin_CS9314_References/cm-latin.pdf) should be set to `ceil(e/epsilon)` where e is Euler's number. However, I noticed in the [current code](https://github.com/addthis/stream-lib/blob/af045cb4199959c07fb4422e605573e180491191/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java#L56-L64) that this is just...
Hi, `HyperLogLogPlus.offerHashed()` constantly returns `true` in SPARSE format, even if hash already counted. https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java#L309 Test: ``` java @Test public void testOffer() { HyperLogLogPlus hll = new HyperLogLogPlus(5,25); assertTrue(hll.offer("ABC")); assertFalse(hll.offer("ABC")); //...
When hashing Strings MurmurHash simply uses the getBytes() method, which will use the default platform encoding. This is not portable. I've changed all calls to getBytes(Charset) and fixed the encoding...
Replaced all of the mutating operations of RegisterSet to be atomic so that HLL's can safely be updated by multiple threads. Tests pass on multiple runs, performance impact on the...
I extensively use the built-in serialization/de-serialization mechanism offered by ICardinality estimators and was a bit irritated by having to catch cumbersome and useless IOExceptions. As long as we are working...