BoomFilters icon indicating copy to clipboard operation
BoomFilters copied to clipboard

Scalable Bloom Filter panics after ~31 million insertions

Open jameshageman-stripe opened this issue 5 years ago • 5 comments

Hey @tylertreat! I noticed that NewDefaultScalableBloomFilter(0.01) panics after exactly 31,536,466 insertions. Here is a test case:

func TestScalableBloomLarge(t *testing.T) {
	total := uint32(40000000) // 40 million
	f := NewDefaultScalableBloomFilter(0.01)

	i := uint32(0)

	defer func() {
		if err := recover(); err != nil {
			fmt.Printf("panicked after %d iterations: %s\n", i, err)
			panic(err)
		}
	}()

	for ; i < total; i++ {
		if i%1000000 == 0 {
			fmt.Printf("  i: %d\n", i)
		}
		key := make([]byte, 8)
		binary.BigEndian.PutUint32(key, i)
		f.Add(key)
	}
}

and here is the test ouput:

=== RUN   TestScalableBloomLarge
  i: 0
  i: 1000000
  i: 2000000
# ...
  i: 29000000
  i: 30000000
  i: 31000000
panicked after 31536466 iterations: runtime error: makeslice: len out of range
--- FAIL: TestScalableBloomLarge (215.81s)
panic: runtime error: makeslice: len out of range [recovered]
	panic: runtime error: makeslice: len out of range [recovered]
	panic: runtime error: makeslice: len out of range

goroutine 132 [running]:
testing.tRunner.func1(0xc000100f00)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:830 +0x392
panic(0x1174160, 0x11d3170)
	/usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
github.com/tylertreat/BoomFilters.TestScalableBloomLarge.func1(0xc00007cf7c)
	/Users/jameshageman/stripe/BoomFilters/scalable_test.go:179 +0x118
panic(0x1174160, 0x11d3170)
	/usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
github.com/tylertreat/BoomFilters.NewPartitionedBloomFilter(0x2710, 0x357fb461c091b, 0x54e5e2762f38eb)
	/Users/jameshageman/stripe/BoomFilters/partitioned.go:52 +0xb1
github.com/tylertreat/BoomFilters.(*ScalableBloomFilter).addFilter(0xc00028a000)
	/Users/jameshageman/stripe/BoomFilters/scalable.go:148 +0x6a
github.com/tylertreat/BoomFilters.(*ScalableBloomFilter).Add(0xc00028a000, 0xc0d18e4858, 0x8, 0x8, 0x11d8120, 0xc00028a000)
	/Users/jameshageman/stripe/BoomFilters/scalable.go:120 +0x112
github.com/tylertreat/BoomFilters.TestScalableBloomLarge(0xc000100f00)
	/Users/jameshageman/stripe/BoomFilters/scalable_test.go:189 +0xc6
testing.tRunner(0xc000100f00, 0x11ad908)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:865 +0xc0
created by testing.(*T).Run
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:916 +0x35a
FAIL	github.com/tylertreat/BoomFilters	218.857s

The culprit appears to be this line: https://github.com/tylertreat/BoomFilters/blob/611b3dbe80e85df3a0a10a43997d4d5784e86245/partitioned.go#L52

which confuses me because k should be a totally reasonable size, and is only a function of fpr and not n.

jameshageman-stripe avatar Mar 21 '19 22:03 jameshageman-stripe

Fascinating discovery and sorry you hit this bug! I will take a look and see if I can figure out what's going on.

tylertreat avatar Mar 22 '19 02:03 tylertreat

So unfortunately the issue is k ends up being too large to allocate a slice of that length. Here's what ends up getting computed in NewPartitionedBloomFilter on the 31536466th iteration:

n 10000
fpRate +4.649956e-309
k 9223372036854775808

This shows what's happening: https://play.golang.org/p/fJD2OaXK2Bc

I'm not sure what exactly to do here since it is the nature of the scalable bloom filter to add continuously larger filters in order to enforce a tight upper bound on false positives.

tylertreat avatar Mar 22 '19 21:03 tylertreat

@tylertreat I see, I assume it would take longer to hit this limit if you set a higher hint? It would seem intuitive as 31 million is so much larger than the default hint of 10000.

jameshageman-stripe avatar Mar 22 '19 22:03 jameshageman-stripe

In this case the limit is hit irrespective of hint since the slice length is k, which is computed based on the target fpRate alone:

https://github.com/tylertreat/BoomFilters/blob/611b3dbe80e85df3a0a10a43997d4d5784e86245/partitioned.go#L51

https://github.com/tylertreat/BoomFilters/blob/611b3dbe80e85df3a0a10a43997d4d5784e86245/partitioned.go#L52

tylertreat avatar Mar 25 '19 03:03 tylertreat

test the limition of the k. where len(filters) to large, the fpRate will become smaller and smaller, and result of 1/fpRate became +Inf, so that make make func panic

playground: test filters limit

package main

import (
	"fmt"
	"math"
)

func main() {
	fp := 0.01
	r := 0.8
	for i := 100; i < 100000000; i = i + 400 {

		fpRate := fp * math.Pow(r, float64(i))
		k := math.Ceil(math.Log2(1 / fpRate))
		fmt.Printf("%+v,%+v, %+v\n", i, fpRate, k)
		_ = make([]*Buckets, uint(k))
	}
}

type Buckets struct {
	data       []byte
	bucketSize uint8
	max        uint8
	count      uint
}

100,2.037035976334508e-12, 39
500,3.507466211043593e-51, 168
900,6.039323489882386e-90, 297
1300,1.0398796744101223e-128, 426
1700,1.790514681094456e-167, 554
2100,3.08299402527833e-206, 683
2500,5.3084469288417205e-245, 812
2900,9.140338438957912e-284, 941
3300,1.6e-322, +Inf
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
main.main()
	/tmp/sandbox2242876909/prog.go:16 +0x3f

Program exited.

darcyaf avatar Oct 27 '22 07:10 darcyaf