fastbloom icon indicating copy to clipboard operation
fastbloom copied to clipboard

A simple but fast bloomfilter written in Python

Fastbloom

A lightweight but fast Bloomfilter written in Python(2.7).

Introduction

Based on mmap and MurMur, Spooky hash functions as the base hashes. Use double hashing to reduce the number of hashes to two.

Requirements

Install boost and boost-python

  • Ubuntu

sudo apt-get install libboost-all-dev

  • Centos

sudo yum install boost-devel

  • OSX

brew install boost boost-python

Install pyhash

sudo pip install pyhash

Install fastbloom

sudo pip install fastbloom

Examples

>>> from fastbloom import BloomFilter
>>> filter_ = BloomFilter(10000, 0.001) # set size of input 1000, error rate 0.1%
>>> filter_.add('www.google.com')
>>> 'www.google.com' in filter_
True
>>> 'www.github.com' in filter_
False

Benchmark

When input_size set to 1000000, accepted_error_rate set to 0.01%

  • Memory consumption: 4 MB
  • Add operation: 7322.42 ops
  • Check operation: 24788.79 ops
  • Actual fault positive rate: 0.0000

You can just run the benchmark.py to see the actual benchmarks

Todo

  • add a faster bloomfilter
  • add save and restore functions(as files)
  • write a scalable bloomfilter