roaring Implement shotgun4Intersect

This is a second attempt at fixing https://github.com/RoaringBitmap/roaring/issues/215

It is based on a previous PR by @alldroll. The changes are as follows:

[X] The benchmark, though still unrealistic, is a tiny bit more realistic. Important: which benchmarking data processing code with branches, you don't want to run the same benchmark over the same data in sequence because the processor might get the ability to predict the branches with an unrealistic ability that is not achievable in the real world.
[X] It looks like the Go compiler is not smart enough to use conditional moves, as it should. So it does many, many branches when compiling the shotgun4 approach. We want to avoid that. A solution (which is kind of a hack) is to replace the branches ("if") by some arithmetic. The arithmetic is more expensive (many more instructions) than a simple conditional move, but saving on mispredicted branches and useless speculative loads is important.

Results should vary but here is what I get after making these changes...

$ go version
go version go1.13 darwin/amd64
lemire@iMac-2018 ~/CVS/github/roaring $ go test  -bench BenchmarkIntersectAlgorithms -run -
goos: darwin
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
BenchmarkIntersectAlgorithms/onesidedgallopingintersect2by2-8         	  123374	      9254 ns/op
BenchmarkIntersectAlgorithms/shotgun4-8                               	  203089	      5521 ns/op
PASS
ok  	github.com/RoaringBitmap/roaring	2.434s

So shotgun4 is "almost" twice as fast. Not bad!

Of course, more work is needed...

[ ] We should use even richer, more realistic testing.
[ ] We need better testing. We pass some sanity tests, but I cannot be certain that the code is correct.
[ ] I cannot vouch that the code is optimal. At a minimum, we need to explore variations such as shotgun6, shotgun8, shotgun2. Note that the results will depend on the hardware, better/more recent hardware should benefit from wider algorithms.
[ ] We need to actually integrate the new approach in the roaring code and do realistic benchmarks there.

I propose that the work be done in the shotgun4 branch directly in the main repository.

Sep 25 '19 17:09 lemire

Coverage increased (+0.2%) to 82.404% when pulling f4760127538ca2fa412056292a29a60ce2431344 on shotgun4 into 4676818d7478f72f5041418f5afbb15a5080dbb7 on master.

Sep 25 '19 18:09 coveralls

Thanks! Let us finish this!

Sep 26 '19 12:09 lemire

I am leaving this PR intentionally open.

Sep 30 '19 17:09 lemire

Tried to follow-up on this branch here: https://github.com/puzpuzpuz/roaring/commit/3ac595940da689dbaeaf9d2899e65129e82eda68

The changes are the following:

Improve tests for shotgun4Intersect to get 100% LoC coverage
Simplify shotgun4Intersect
Add real data benchmark for And

The real data benchmark I've added looks like the following:

func BenchmarkRealDataAnd(b *testing.B) {
	benchmarkRealDataAggregate(b, func(bitmaps []*Bitmap) uint64 {
		b := bitmaps[0]
		for i := 1; i < len(bitmaps); i++ {
			b = And(b, bitmaps[i])
		}
		return b.GetCardinality()
	})
}

I'm not sure if it's good enough though. Looking forward to hear your thoughts. The results on my machine are the following.

Environment: Ubuntu 20.04 x86-64, Go 1.16.5, i5-8300h CPU

onesidedgallopingintersect2by2:

$ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8         	  112808	     10660 ns/op
BenchmarkRealDataAnd/census-income-8             	  116253	     10362 ns/op
BenchmarkRealDataAnd/census1881_srt-8            	  135080	      8795 ns/op
BenchmarkRealDataAnd/census1881-8                	  130599	      9168 ns/op
BenchmarkRealDataAnd/dimension_003-8             	    1376	    885094 ns/op
BenchmarkRealDataAnd/dimension_008-8             	    4058	    308777 ns/op
BenchmarkRealDataAnd/dimension_033-8             	   25941	     46678 ns/op
BenchmarkRealDataAnd/uscensus2000-8              	  121574	      9786 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8       	   97990	     12082 ns/op
BenchmarkRealDataAnd/weather_sept_85-8           	   43671	     28879 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8    	  128178	     14734 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8        	  133525	      9009 ns/op
PASS
ok  	github.com/RoaringBitmap/roaring	39.732s

shotgun4Intersect:

$ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8         	  111625	     10605 ns/op
BenchmarkRealDataAnd/census-income-8             	  117244	     10248 ns/op
BenchmarkRealDataAnd/census1881_srt-8            	  133972	      8770 ns/op
BenchmarkRealDataAnd/census1881-8                	  131566	      9120 ns/op
BenchmarkRealDataAnd/dimension_003-8             	    1359	    871526 ns/op
BenchmarkRealDataAnd/dimension_008-8             	    4050	    292773 ns/op
BenchmarkRealDataAnd/dimension_033-8             	   26175	     44918 ns/op
BenchmarkRealDataAnd/uscensus2000-8              	  127453	      9230 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8       	   97687	     12014 ns/op
BenchmarkRealDataAnd/weather_sept_85-8           	   44283	     26967 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8    	  135856	      8815 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8        	  134486	      8959 ns/op
PASS
ok  	github.com/RoaringBitmap/roaring	38.144s

No significant difference except for the wikileaks-noquotes_srt data set.

Jul 10 '21 09:07 puzpuzpuz

Thanks. This is useful and seems to go toward my own bias: there may simply be not enough gain to make it worth it.

Jul 10 '21 16:07 lemire

Yes, so far I'm also under impression that shotgun intersection doesn't make the weather on operations with real data sets. I'm going to stop at this point and won't make further analysis. Going to check other GH issues and find the ones I could help with.

Jul 10 '21 19:07 puzpuzpuz

@puzpuzpuz That was helpful!

Jul 10 '21 19:07 lemire

roaring roaring copied to clipboard

Implement shotgun4Intersect

roaring
roaring copied to clipboard