BoomFilters
BoomFilters copied to clipboard
Scalable Bloom Filter panics after ~31 million insertions
Hey @tylertreat! I noticed that NewDefaultScalableBloomFilter(0.01)
panics after exactly 31,536,466 insertions. Here is a test case:
func TestScalableBloomLarge(t *testing.T) {
total := uint32(40000000) // 40 million
f := NewDefaultScalableBloomFilter(0.01)
i := uint32(0)
defer func() {
if err := recover(); err != nil {
fmt.Printf("panicked after %d iterations: %s\n", i, err)
panic(err)
}
}()
for ; i < total; i++ {
if i%1000000 == 0 {
fmt.Printf(" i: %d\n", i)
}
key := make([]byte, 8)
binary.BigEndian.PutUint32(key, i)
f.Add(key)
}
}
and here is the test ouput:
=== RUN TestScalableBloomLarge
i: 0
i: 1000000
i: 2000000
# ...
i: 29000000
i: 30000000
i: 31000000
panicked after 31536466 iterations: runtime error: makeslice: len out of range
--- FAIL: TestScalableBloomLarge (215.81s)
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range
goroutine 132 [running]:
testing.tRunner.func1(0xc000100f00)
/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:830 +0x392
panic(0x1174160, 0x11d3170)
/usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
github.com/tylertreat/BoomFilters.TestScalableBloomLarge.func1(0xc00007cf7c)
/Users/jameshageman/stripe/BoomFilters/scalable_test.go:179 +0x118
panic(0x1174160, 0x11d3170)
/usr/local/Cellar/go/1.12.1/libexec/src/runtime/panic.go:522 +0x1b5
github.com/tylertreat/BoomFilters.NewPartitionedBloomFilter(0x2710, 0x357fb461c091b, 0x54e5e2762f38eb)
/Users/jameshageman/stripe/BoomFilters/partitioned.go:52 +0xb1
github.com/tylertreat/BoomFilters.(*ScalableBloomFilter).addFilter(0xc00028a000)
/Users/jameshageman/stripe/BoomFilters/scalable.go:148 +0x6a
github.com/tylertreat/BoomFilters.(*ScalableBloomFilter).Add(0xc00028a000, 0xc0d18e4858, 0x8, 0x8, 0x11d8120, 0xc00028a000)
/Users/jameshageman/stripe/BoomFilters/scalable.go:120 +0x112
github.com/tylertreat/BoomFilters.TestScalableBloomLarge(0xc000100f00)
/Users/jameshageman/stripe/BoomFilters/scalable_test.go:189 +0xc6
testing.tRunner(0xc000100f00, 0x11ad908)
/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:865 +0xc0
created by testing.(*T).Run
/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:916 +0x35a
FAIL github.com/tylertreat/BoomFilters 218.857s
The culprit appears to be this line: https://github.com/tylertreat/BoomFilters/blob/611b3dbe80e85df3a0a10a43997d4d5784e86245/partitioned.go#L52
which confuses me because k
should be a totally reasonable size, and is only a function of fpr
and not n
.
Fascinating discovery and sorry you hit this bug! I will take a look and see if I can figure out what's going on.
So unfortunately the issue is k
ends up being too large to allocate a slice of that length. Here's what ends up getting computed in NewPartitionedBloomFilter
on the 31536466th iteration:
n 10000
fpRate +4.649956e-309
k 9223372036854775808
This shows what's happening: https://play.golang.org/p/fJD2OaXK2Bc
I'm not sure what exactly to do here since it is the nature of the scalable bloom filter to add continuously larger filters in order to enforce a tight upper bound on false positives.
@tylertreat I see, I assume it would take longer to hit this limit if you set a higher hint
? It would seem intuitive as 31 million is so much larger than the default hint of 10000
.
In this case the limit is hit irrespective of hint
since the slice length is k
, which is computed based on the target fpRate
alone:
https://github.com/tylertreat/BoomFilters/blob/611b3dbe80e85df3a0a10a43997d4d5784e86245/partitioned.go#L51
https://github.com/tylertreat/BoomFilters/blob/611b3dbe80e85df3a0a10a43997d4d5784e86245/partitioned.go#L52
test the limition of the k. where len(filters) to large, the fpRate will become smaller and smaller, and result of 1/fpRate became +Inf, so that make make func panic
playground: test filters limit
package main
import (
"fmt"
"math"
)
func main() {
fp := 0.01
r := 0.8
for i := 100; i < 100000000; i = i + 400 {
fpRate := fp * math.Pow(r, float64(i))
k := math.Ceil(math.Log2(1 / fpRate))
fmt.Printf("%+v,%+v, %+v\n", i, fpRate, k)
_ = make([]*Buckets, uint(k))
}
}
type Buckets struct {
data []byte
bucketSize uint8
max uint8
count uint
}
100,2.037035976334508e-12, 39
500,3.507466211043593e-51, 168
900,6.039323489882386e-90, 297
1300,1.0398796744101223e-128, 426
1700,1.790514681094456e-167, 554
2100,3.08299402527833e-206, 683
2500,5.3084469288417205e-245, 812
2900,9.140338438957912e-284, 941
3300,1.6e-322, +Inf
panic: runtime error: makeslice: len out of range
goroutine 1 [running]:
main.main()
/tmp/sandbox2242876909/prog.go:16 +0x3f
Program exited.