Malware-Probabilistic-Data-Structres icon indicating copy to clipboard operation
Malware-Probabilistic-Data-Structres copied to clipboard

NSRL BloomFilter, Mandiant BloomFilter, Hyperloglog Malware Data Structure

Malware-Probabilistic-Data-Structures

NSRL BloomFilter, Mandiant BloomFilter, Hyperloglog Malware Data Structure

Bloom Filters save space - Millions of NSRL MD5s in 17megabytes instead of 2.6gigs

In [44]: ls -la NSRLGood.bloomfilter
-rwxr-xr-x  1 antigen  staff  17973266 12 Mar 13:15 NSRLGood.bloomfilter

In [45]: ls -la NSRLFile.txt
-r--r--r--@ 1 antigen  staff  2611139266 30 Jan 14:00 NSRLFile.txt

##Class ingests MD5 data from NSRLFile and updates bloomfilter for fast in-memory queries

Example:

>>> from NSRLmd5BloomFilter import MD5BloomFilter
>>> filename = '~/Downloads/unique/NSRLFile.txt'
>>> size = 100000000
>>> error = 0.001
>>> bloom_filename = 'NSRLgood.bloomfilter'
>>> hash_bloom = MD5BloomFilter(size, error, bloom_filename)
>>> hash_bloom.process()
>>> print "F16FF81271ADA49847E6EB6BB9CB8A90" in NSRL_good # positive
>>> print 'testTESTtest' in NSRL_good # false

##TODO

Figure out how to load a hash_bloom.bloomfilter that is already loaded with data and use it