bloom icon indicating copy to clipboard operation
bloom copied to clipboard

Strange false positive with single item capacity filter

Open dotjim opened this issue 1 year ago • 0 comments
trafficstars

Greetings,

We've encountered a strange false positive case with a filter provisioned with single capacity and 1:billion error rate. If we increase the filter capacity to two or higher the false positive disappears, or if we keep the capacity at one and use a different error rate (e.g. 1:million) the false positive disappears.

A test to illustrate against the latest release v3.6.0:

func TestBitsAndBloom(t *testing.T) {
  assert := assert.New(t)

  // Capacity = 1 with 1:million error rate - passes
  filter := bitsandbloom.NewWithEstimates(1, 0.00000001)
  filter.AddString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685")
  assert.True(filter.TestString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685"), "Should exist")      // Passes
  assert.False(filter.TestString("8854e4a0a9b78632e8de91ffb973336a7a4f1b9a89520149ae311416de42d162"), "Should not exist") // Passes

  // Capacity = 2 with 1:billion error rate - passes
  filter = bitsandbloom.NewWithEstimates(2, 0.000000001)
  filter.AddString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685")
  assert.True(filter.TestString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685"), "Should exist")      // Passes
  assert.False(filter.TestString("8854e4a0a9b78632e8de91ffb973336a7a4f1b9a89520149ae311416de42d162"), "Should not exist") // Passes

  // Capacity = 1 with 1:billion error rate - fails
  filter = bitsandbloom.NewWithEstimates(1, 0.000000001)
  filter.AddString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685")
  assert.True(filter.TestString("d39bafed5f63e8e85c4bbcf171e3a7bb4f8542dc4237cd94d0e0ff7c08e5c685"), "Should exist")      // Passes
  assert.False(filter.TestString("8854e4a0a9b78632e8de91ffb973336a7a4f1b9a89520149ae311416de42d162"), "Should not exist") // Fails
}

We appreciate a single item capacity filter isn't the intended use case of a bloom filter (an edge case on our side), but the behaviour is curious. If not considered a bug, could readme guidance be included on any minimum capacity to achieve the required error rate?

Thanks, -Dotjim

dotjim avatar Dec 15 '23 16:12 dotjim