roaring
roaring copied to clipboard
Implement shotgun4Intersect
This is a second attempt at fixing https://github.com/RoaringBitmap/roaring/issues/215
It is based on a previous PR by @alldroll. The changes are as follows:
-
[X] The benchmark, though still unrealistic, is a tiny bit more realistic. Important: which benchmarking data processing code with branches, you don't want to run the same benchmark over the same data in sequence because the processor might get the ability to predict the branches with an unrealistic ability that is not achievable in the real world.
-
[X] It looks like the Go compiler is not smart enough to use conditional moves, as it should. So it does many, many branches when compiling the shotgun4 approach. We want to avoid that. A solution (which is kind of a hack) is to replace the branches ("if") by some arithmetic. The arithmetic is more expensive (many more instructions) than a simple conditional move, but saving on mispredicted branches and useless speculative loads is important.
Results should vary but here is what I get after making these changes...
$ go version
go version go1.13 darwin/amd64
lemire@iMac-2018 ~/CVS/github/roaring $ go test -bench BenchmarkIntersectAlgorithms -run -
goos: darwin
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
BenchmarkIntersectAlgorithms/onesidedgallopingintersect2by2-8 123374 9254 ns/op
BenchmarkIntersectAlgorithms/shotgun4-8 203089 5521 ns/op
PASS
ok github.com/RoaringBitmap/roaring 2.434s
So shotgun4 is "almost" twice as fast. Not bad!
Of course, more work is needed...
- [ ] We should use even richer, more realistic testing.
- [ ] We need better testing. We pass some sanity tests, but I cannot be certain that the code is correct.
- [ ] I cannot vouch that the code is optimal. At a minimum, we need to explore variations such as shotgun6, shotgun8, shotgun2. Note that the results will depend on the hardware, better/more recent hardware should benefit from wider algorithms.
- [ ] We need to actually integrate the new approach in the roaring code and do realistic benchmarks there.
I propose that the work be done in the shotgun4 branch directly in the main repository.
Coverage increased (+0.2%) to 82.404% when pulling f4760127538ca2fa412056292a29a60ce2431344 on shotgun4 into 4676818d7478f72f5041418f5afbb15a5080dbb7 on master.
Thanks! Let us finish this!
I am leaving this PR intentionally open.
Tried to follow-up on this branch here: https://github.com/puzpuzpuz/roaring/commit/3ac595940da689dbaeaf9d2899e65129e82eda68
The changes are the following:
- Improve tests for shotgun4Intersect to get 100% LoC coverage
- Simplify shotgun4Intersect
- Add real data benchmark for
And
The real data benchmark I've added looks like the following:
func BenchmarkRealDataAnd(b *testing.B) {
benchmarkRealDataAggregate(b, func(bitmaps []*Bitmap) uint64 {
b := bitmaps[0]
for i := 1; i < len(bitmaps); i++ {
b = And(b, bitmaps[i])
}
return b.GetCardinality()
})
}
I'm not sure if it's good enough though. Looking forward to hear your thoughts. The results on my machine are the following.
Environment: Ubuntu 20.04 x86-64, Go 1.16.5, i5-8300h CPU
onesidedgallopingintersect2by2:
$ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8 112808 10660 ns/op
BenchmarkRealDataAnd/census-income-8 116253 10362 ns/op
BenchmarkRealDataAnd/census1881_srt-8 135080 8795 ns/op
BenchmarkRealDataAnd/census1881-8 130599 9168 ns/op
BenchmarkRealDataAnd/dimension_003-8 1376 885094 ns/op
BenchmarkRealDataAnd/dimension_008-8 4058 308777 ns/op
BenchmarkRealDataAnd/dimension_033-8 25941 46678 ns/op
BenchmarkRealDataAnd/uscensus2000-8 121574 9786 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8 97990 12082 ns/op
BenchmarkRealDataAnd/weather_sept_85-8 43671 28879 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8 128178 14734 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8 133525 9009 ns/op
PASS
ok github.com/RoaringBitmap/roaring 39.732s
shotgun4Intersect:
$ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8 111625 10605 ns/op
BenchmarkRealDataAnd/census-income-8 117244 10248 ns/op
BenchmarkRealDataAnd/census1881_srt-8 133972 8770 ns/op
BenchmarkRealDataAnd/census1881-8 131566 9120 ns/op
BenchmarkRealDataAnd/dimension_003-8 1359 871526 ns/op
BenchmarkRealDataAnd/dimension_008-8 4050 292773 ns/op
BenchmarkRealDataAnd/dimension_033-8 26175 44918 ns/op
BenchmarkRealDataAnd/uscensus2000-8 127453 9230 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8 97687 12014 ns/op
BenchmarkRealDataAnd/weather_sept_85-8 44283 26967 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8 135856 8815 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8 134486 8959 ns/op
PASS
ok github.com/RoaringBitmap/roaring 38.144s
No significant difference except for the wikileaks-noquotes_srt
data set.
Thanks. This is useful and seems to go toward my own bias: there may simply be not enough gain to make it worth it.
Yes, so far I'm also under impression that shotgun intersection doesn't make the weather on operations with real data sets. I'm going to stop at this point and won't make further analysis. Going to check other GH issues and find the ones I could help with.
@puzpuzpuz That was helpful!