bloom
bloom copied to clipboard
Strange false positive with single item capacity filter
trafficstars
Greetings,
We've encountered a strange false positive case with a filter provisioned with single capacity and 1:billion error rate. If we increase the filter capacity to two or higher the false positive disappears, or if we keep the capacity at one and use a different error rate (e.g. 1:million) the false positive disappears.
A test to illustrate against the latest release v3.6.0:
func TestBitsAndBloom(t *testing.T) {
assert := assert.New(t)
// Capacity = 1 with 1:million error rate - passes
filter := bitsandbloom.NewWithEstimates(1, 0.00000001)
filter.AddString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685")
assert.True(filter.TestString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685"), "Should exist") // Passes
assert.False(filter.TestString("8854e4a0a9b78632e8de91ffb973336a7a4f1b9a89520149ae311416de42d162"), "Should not exist") // Passes
// Capacity = 2 with 1:billion error rate - passes
filter = bitsandbloom.NewWithEstimates(2, 0.000000001)
filter.AddString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685")
assert.True(filter.TestString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685"), "Should exist") // Passes
assert.False(filter.TestString("8854e4a0a9b78632e8de91ffb973336a7a4f1b9a89520149ae311416de42d162"), "Should not exist") // Passes
// Capacity = 1 with 1:billion error rate - fails
filter = bitsandbloom.NewWithEstimates(1, 0.000000001)
filter.AddString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685")
assert.True(filter.TestString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685"), "Should exist") // Passes
assert.False(filter.TestString("8854e4a0a9b78632e8de91ffb973336a7a4f1b9a89520149ae311416de42d162"), "Should not exist") // Fails
}
We appreciate a single item capacity filter isn't the intended use case of a bloom filter (an edge case on our side), but the behaviour is curious. If not considered a bug, could readme guidance be included on any minimum capacity to achieve the required error rate?
Thanks, -Dotjim