fastbloom
fastbloom copied to clipboard
A simple but fast bloomfilter written in Python
Fastbloom
A lightweight but fast Bloomfilter written in Python(2.7).
Introduction
Based on mmap and MurMur, Spooky hash functions as the base hashes. Use double hashing to reduce the number of hashes to two.
Requirements
Install boost and boost-python
- Ubuntu
sudo apt-get install libboost-all-dev
- Centos
sudo yum install boost-devel
- OSX
brew install boost boost-python
Install pyhash
sudo pip install pyhash
Install fastbloom
sudo pip install fastbloom
Examples
>>> from fastbloom import BloomFilter
>>> filter_ = BloomFilter(10000, 0.001) # set size of input 1000, error rate 0.1%
>>> filter_.add('www.google.com')
>>> 'www.google.com' in filter_
True
>>> 'www.github.com' in filter_
False
Benchmark
When input_size set to 1000000, accepted_error_rate set to 0.01%
- Memory consumption: 4 MB
- Add operation: 7322.42 ops
- Check operation: 24788.79 ops
- Actual fault positive rate: 0.0000
You can just run the benchmark.py to see the actual benchmarks
Todo
- add a faster bloomfilter
- add save and restore functions(as files)
- write a scalable bloomfilter