python-bloomfilter icon indicating copy to clipboard operation
python-bloomfilter copied to clipboard

Bug: Inconsistency in how `fromfile` and `intersection` and `union` deals with self.bitarray

Open srean opened this issue 7 years ago • 0 comments

Once you deserialize a serialized BloomFilter object the self.bitarray length might differ because of added padding.

https://github.com/jaybaird/python-bloomfilter/blob/master/pybloom/pybloom.py#L271

Here difference in length due to the trailing bits is ignored.

No such accounting of differing bitarray lengths are being done here https://github.com/jaybaird/python-bloomfilter/blob/master/pybloom/pybloom.py#L224 or https://github.com/jaybaird/python-bloomfilter/blob/master/pybloom/pybloom.py#L238 . Here the bitarray union and intersection will fail if the bitarray.length( ) are different. The lengths may differ because of a roundtrip through serialization deserialization, even when the capacity and error-rates are the same.

I think the correct thing to do here is to strip off the padding in fromfile to ensure that the bitarray representation is exactly the same

srean avatar May 03 '18 08:05 srean