quicksilver
quicksilver copied to clipboard
Quicksilver - a library of approximate algorithms and sketches for Rust
Overview
Quicksilver is a library of approximate algorithm and sketches
The algorithms contained within this library are designed to calculate common metrics and statistics (such as cardinality, frequency, etc) in an approximate fashion. In exchange for decreased accuracy, these algorithms typically have minimal memory overhead or are very fast. Or both.
Build status with Rust Nightly:
##Documentation
API Docs are auto-generated after every commit: http://www.rust-ci.org/polyfractal/quicksilver/doc/quicksilver/
Public modules
HLL - HyperLogLog
Approximates cardinality estimation with minimal memory overhead
This implements HyperLogLog, an algorithm which provides a reasonably accurate estimation of cardinality. It is very fast (200m op/s on my Macbook Air) and requires minimal memory. It can estimate the cardinality of sets that contain billions of entries.
See this article for a good laymen explanation of HyperLogLog.
Usage
use quicksilver::hll::HLL;
use std::hash::Hash;
let mut hll = HLL::new(12);
for i in range(0u, 1000000) {
let hash = hash::hash(&i);
hll.offer_hashed(&hash);
}
let cardinality = hll.cardinality();
Precision vs Memory chart
Precision | Size | Worst Case Estimate Error (+/-) |
4 | 48 bytes | 39.00% |
5 | 60 bytes | 27.50% |
6 | 84 bytes | 19.50% |
7 | 136 bytes | 13.78% |
8 | 240 bytes | 9.75% |
9 | 444 bytes | 6.89% |
10 | 852 bytes | 4.8% |
11 | 1672 bytes | 3.44% |
12 | 3312 bytes | 2.43% |
13 | 6588 bytes | 1.72% |
14 | 13140 bytes | 1.21% |
15 | 26248 bytes | 0.86% |
16 | 52464 bytes | 0.60% |
17 | 104892 bytes | 0.43% |
18 | 209748 bytes | 0.30% |
PCSA - Probabilistic Counting with Stochastic Averaging
Implements the Probabilistic Counting with Stochastic Averaging counter. PCSA provides an approximate estimate of cardinality with bounded error. Relative error is 0.78 / sqrt(m)
See this article for a layman's explanation of PCSA
Original paper can be found here
Note: PCSA is generally inferior to HLL in estimation accuracy, memory usage and performance
Usage
use quicksilver::pcsa::PCSA;
use std::hash::Hash;
let mut hll = PCSA::new(10);
for i in range(0u, 1000000) {
let hash = hash::hash(&i);
pcsa.offer_hashed(hash);
}
let cardinality = pcsa.cardinality();
Precision vs Memory chart
Precision | Size | Error on 1m Cardinality Test | Worst Case Estimate Error (+/-) |
4 | 80 bytes | 0.09% | 39.00% |
5 | 144 bytes | 4.14% | 34.88% |
6 | 272 bytes | 1.18% | 31.84% |
7 | 528 bytes | 3.62% | 29.48% |
8 | 1040 bytes | 0.71% | 27.57% |
9 | 2064 bytes | 1.51% | 26.00% |
10 | 4112 bytes | 0.77% | 24.66% |
11 | 8208 bytes | 1.15% | 23.51% |
12 | 16400 bytes | 1.10% | 22.51% |
13 | 32784 bytes | 0.96% | 21.63% |
14 | 65552 bytes | 0.20% | 20.84% |
15 | 131088 bytes | 0.18% | 20.13% |