cuckoofilter
cuckoofilter copied to clipboard
Optimize the "Hash" and make it faster
github.com/seiflotfy/cuckoofilter first calculate the hash when it's in initialization.
And I try this to optimize this "better" version of cuckoofilter.
Test is all passed.
=== RUN TestBucket_Reset
--- PASS: TestBucket_Reset (0.00s)
=== RUN TestInsertion
--- PASS: TestInsertion (0.03s)
=== RUN TestLookup
=== RUN TestLookup/cf.Lookup("one")
=== PAUSE TestLookup/cf.Lookup("one")
=== RUN TestLookup/cf.Lookup("two")
=== PAUSE TestLookup/cf.Lookup("two")
=== RUN TestLookup/cf.Lookup("three")
=== PAUSE TestLookup/cf.Lookup("three")
=== RUN TestLookup/cf.Lookup("four")
=== PAUSE TestLookup/cf.Lookup("four")
=== RUN TestLookup/cf.Lookup("five")
=== PAUSE TestLookup/cf.Lookup("five")
=== CONT TestLookup/cf.Lookup("one")
=== CONT TestLookup/cf.Lookup("four")
=== CONT TestLookup/cf.Lookup("five")
=== CONT TestLookup/cf.Lookup("three")
=== CONT TestLookup/cf.Lookup("two")
--- PASS: TestLookup (0.00s)
--- PASS: TestLookup/cf.Lookup("one") (0.00s)
--- PASS: TestLookup/cf.Lookup("four") (0.00s)
--- PASS: TestLookup/cf.Lookup("five") (0.00s)
--- PASS: TestLookup/cf.Lookup("three") (0.00s)
--- PASS: TestLookup/cf.Lookup("two") (0.00s)
=== RUN TestFilter_LookupLarge
--- PASS: TestFilter_LookupLarge (0.09s)
=== RUN TestFilter_Insert
--- PASS: TestFilter_Insert (0.00s)
=== RUN TestDelete
=== RUN TestDelete/cf.Delete("four")
=== RUN TestDelete/cf.Delete("five")
=== RUN TestDelete/cf.Delete("one")
=== RUN TestDelete/cf.Delete("two")
=== RUN TestDelete/cf.Delete("three")
--- PASS: TestDelete (0.00s)
--- PASS: TestDelete/cf.Delete("four") (0.00s)
--- PASS: TestDelete/cf.Delete("five") (0.00s)
--- PASS: TestDelete/cf.Delete("one") (0.00s)
--- PASS: TestDelete/cf.Delete("two") (0.00s)
--- PASS: TestDelete/cf.Delete("three") (0.00s)
=== RUN TestDeleteMultipleSame
cuckoofilter_test.go:229: Filter state full: &{[[65037 65037 65037 65037] [65037 0 0 0]] 5 1}
=== RUN TestDeleteMultipleSame/cf.Delete("missing")
=== RUN TestDeleteMultipleSame/cf.Delete("missing2")
=== RUN TestDeleteMultipleSame/cf.Delete("some_item")
=== RUN TestDeleteMultipleSame/cf.Delete("some_item")#01
=== RUN TestDeleteMultipleSame/cf.Delete("some_item")#02
=== RUN TestDeleteMultipleSame/cf.Delete("some_item")#03
=== RUN TestDeleteMultipleSame/cf.Delete("some_item")#04
=== RUN TestDeleteMultipleSame/cf.Delete("some_item")#05
--- PASS: TestDeleteMultipleSame (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("missing") (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("missing2") (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("some_item") (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("some_item")#01 (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("some_item")#02 (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("some_item")#03 (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("some_item")#04 (0.00s)
--- PASS: TestDeleteMultipleSame/cf.Delete("some_item")#05 (0.00s)
=== RUN TestEncodeDecode
--- PASS: TestEncodeDecode (0.00s)
=== RUN TestIndexAndFP
--- PASS: TestIndexAndFP (0.00s)
=== RUN FuzzDecode
=== RUN FuzzDecode/seed#0
--- PASS: FuzzDecode (0.00s)
--- PASS: FuzzDecode/seed#0 (0.00s)
=== RUN Example
--- PASS: Example (0.00s)
=== RUN ExampleFilter_Lookup
--- PASS: ExampleFilter_Lookup (0.00s)
=== RUN ExampleFilter_Delete
--- PASS: ExampleFilter_Delete (0.00s)
=== RUN Example_threadSafe
--- PASS: Example_threadSafe (0.00s)
Here is the benchmark result:
Before optimization:
BenchmarkFilter_Reset
BenchmarkFilter_Reset-8 97090 12767 ns/op
BenchmarkFilter_Insert
BenchmarkFilter_Insert-8 39633634 30.11 ns/op
BenchmarkFilter_Lookup
BenchmarkFilter_Lookup-8 37294400 31.90 ns/op
PASS
After optimization:
BenchmarkFilter_Reset
BenchmarkFilter_Reset-8 97978 12345 ns/op
BenchmarkFilter_Insert
BenchmarkFilter_Insert-8 54499240 22.14 ns/op
BenchmarkFilter_Lookup
BenchmarkFilter_Lookup-8 57099548 20.85 ns/op
PASS
github.com/seiflotfy/cuckoofilter(The benchmark file has been replaced by the same.):
BenchmarkFilter_Insert
BenchmarkFilter_Insert-8 38737519 29.49 ns/op
BenchmarkFilter_Lookup
BenchmarkFilter_Lookup-8 44468107 27.07 ns/op
BenchmarkFilter_Reset
BenchmarkFilter_Reset-8 97950 12242 ns/op
PASS
@panmari
Hello, sorry for delay.
Here is the evaluation benchmark result.
The source code : https://github.com/MeteorsLiu/test-cmp
- Compare Randi(Wyrng VS Go):
BenchmarkRandiGoMod-8 75846440 15.78 ns/op
BenchmarkRandiGo-8 76128673 15.77 ns/op
BenchmarkRandiWyrng-8 196768308 5.974 ns/op
BenchmarkRandiWyrngMod-8 200895794 5.999 ns/op
- Compare Hash Algorithm (XXH3 VS Metro)
BenchmarkMetroHash-8 175887223 6.801 ns/op
BenchmarkXXH3Hash-8 176485131 6.815 ns/op
- Compare Static looking up with hashing again.
BenchmarkMetroHash-8 175887223 6.801 ns/op
BenchmarkXXH3Hash-8 176485131 6.815 ns/op
BenchmarkStaticHash-8 1000000000 0.2779 ns/op
Answer:
- This makes it much harder to read, is the optimization really worth it? Please do a separate PR/evaluation.
Yes, it worths.
Golang's rand has a mutex lock, which slows down the process, while SRNG doesn't require a lock(replaced by atomical operation)
Also, Wyrng is a faster pseudorandom algorithm than the go's.
- What's the advantage in deriving endianes? Hard-coding either big or little for all platforms should do the same afaik.
Maybe no?
- This comes with a memory penalty of 2^16 /1024 = 64kb. Please do a separate PR/evaluation to see if it's worth.
Although it takes 64kb to store the hash result, its advantage is that it only calculate for the first time.
When someone is quering the cuckoofilter for many times, it's a great advantage without doubt.
Because getAltIndex is used for many times, which calculates the repeat Hash value for many times, this repeat calculation can be reduced by this.
And I believe that 64kb is a "small" memory block, while cuckoofilter may takes 2MB to store the hash entries.
@panmari
And Why I replace the MetroHash with XXH3 .
According to the tests, https://github.com/rurban/smhasher
MetroHash 64 has a LongNeighbor problem, while XXH3 hasn't.
https://github.com/hmakholm/smhasher/blob/master/src/LongNeighborTest.md
As far as i'm concerned, XXH3 is more quality than Metro. Also, according to my benchmark result, its speed is close to each other.
A quality Hash algorithm will have a great impact on the correctness of the cuckoofilter.
Thanks for breaking up the evaluation!
I would still prefer merging/discussing this as separate pull requests. Could you please split them up? See also comments above.
Sorry, i misunderstand your meaning. I will separate this requests a few days later.