opa builtins: Speed up set unions

builtins: Speed up set unions

Open philipaconrad opened this issue 1 year ago • 2 comments

This PR fixes a performance regression for the set union builtin, discovered in issue #4979.

The original logic for the builtin did pairwise Set.Union calls between the input sets, resulting in a lot of wasted temporary Sets that were almost immediately discarded, and a lot of duplicated work. The updated builtin inlines the logic from Set.Union, so that only one pass is made across the incoming sets' members.

In the included benchmarks, this provides a small, but significant improvement over using a pure-Rego comprehension to do the same job.

Fixes #4979

The 3x relevant benchmarks are shown below:

union on main branch
Pure Rego implementation (set comprehension)
union on this PR

Benchmarks

Benchmark of union on main branch:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkSetUnion$ github.com/open-policy-agent/opa/topdown

goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/topdown
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkSetUnion/10x10-8         	    2229	    544644 ns/op	  232683 B/op	    3759 allocs/op
BenchmarkSetUnion/10x100-8        	     265	   4402486 ns/op	 1754574 B/op	   24439 allocs/op
BenchmarkSetUnion/10x250-8        	     100	  11489154 ns/op	 4423815 B/op	   60298 allocs/op
BenchmarkSetUnion/100x10-8        	      62	  19831002 ns/op	 7055828 B/op	   32975 allocs/op
BenchmarkSetUnion/100x100-8       	       5	 216948499 ns/op	65802438 B/op	  228261 allocs/op
BenchmarkSetUnion/100x250-8       	       2	 639396404 ns/op	159630756 B/op	  570903 allocs/op
BenchmarkSetUnion/250x10-8        	       9	 117371750 ns/op	38499626 B/op	   86745 allocs/op
BenchmarkSetUnion/250x100-8       	       1	1469090944 ns/op	358555656 B/op	  622447 allocs/op
BenchmarkSetUnion/250x250-8       	       1	4215347082 ns/op	963369008 B/op	 1594872 allocs/op
PASS
ok  	github.com/open-policy-agent/opa/topdown	16.348s

Benchmark of Pure Rego implementation:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkSetUnionSlow$ github.com/open-policy-agent/opa/topdown

goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/topdown
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkSetUnionSlow/10x10-8         	    2569	    461347 ns/op	  194384 B/op	    4015 allocs/op
BenchmarkSetUnionSlow/10x100-8        	     343	   4485940 ns/op	 1403187 B/op	   27487 allocs/op
BenchmarkSetUnionSlow/10x250-8        	     130	   8754293 ns/op	 3479220 B/op	   68079 allocs/op
BenchmarkSetUnionSlow/100x10-8        	     243	   4330704 ns/op	 1632044 B/op	   33589 allocs/op
BenchmarkSetUnionSlow/100x100-8       	      34	  33604653 ns/op	12636059 B/op	  244572 allocs/op
BenchmarkSetUnionSlow/100x250-8       	      13	  89953012 ns/op	31412245 B/op	  610317 allocs/op
BenchmarkSetUnionSlow/250x10-8        	     100	  10220165 ns/op	 4017694 B/op	   82992 allocs/op
BenchmarkSetUnionSlow/250x100-8       	      12	 101469788 ns/op	31077606 B/op	  606872 allocs/op
BenchmarkSetUnionSlow/250x250-8       	       3	 367955598 ns/op	80279797 B/op	 1514052 allocs/op
PASS
ok  	github.com/open-policy-agent/opa/topdown	13.738s

Benchmark of union on this PR:

PR:

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkSetUnion$ github.com/open-policy-agent/opa/topdown

goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/topdown
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkSetUnion/10x10-8         	    2788	    385342 ns/op	  174161 B/op	    3593 allocs/op
BenchmarkSetUnion/10x100-8        	     433	   2946532 ns/op	 1224484 B/op	   24095 allocs/op
BenchmarkSetUnion/10x250-8        	     123	   9764934 ns/op	 3036327 B/op	   59735 allocs/op
BenchmarkSetUnion/100x10-8        	     313	   3304217 ns/op	 1442761 B/op	   29577 allocs/op
BenchmarkSetUnion/100x100-8       	      45	  27416137 ns/op	10991659 B/op	  213286 allocs/op
BenchmarkSetUnion/100x250-8       	      15	  73213722 ns/op	27342127 B/op	  533573 allocs/op
BenchmarkSetUnion/250x10-8        	     138	   8663335 ns/op	 3553444 B/op	   72984 allocs/op
BenchmarkSetUnion/250x100-8       	      16	  71706943 ns/op	26999915 B/op	  529107 allocs/op
BenchmarkSetUnion/250x250-8       	       6	 199493126 ns/op	70179568 B/op	 1323346 allocs/op
PASS
ok  	github.com/open-policy-agent/opa/topdown	15.198s

Summary of Results

The benchmarks show that the PR is roughly 15-30% faster and has ~14% fewer allocations than the pure Rego solution. The resolution of these charts are blown out by the 10x10 and 250x250 cases, but still paint a strong picture of the performance changes nonetheless:

2022-08-08_set-union-optimization

Aug 08 '22 17:08 philipaconrad

I'm going to add some correctness/regression tests, at the suggestion of @charlesdaniels.

The tests will compare the behavior of union(...) against what the reference should produce. These should be cheap to run, and will help us prevent a silly regression if our existing test suite doesn't already check for this stuff.

Aug 08 '22 21:08 philipaconrad

I've split the union tests/benchmarks off into their own files, since I'm planning to soon be doing the same process for the intersection builtin, and topdown_bench_test was getting a bit crowded.

Aug 08 '22 23:08 philipaconrad

@tsandall @srenatus This PR should be ready for review! :smile:

Aug 10 '22 16:08 philipaconrad

LGTM. In this case, inlining the union logic is cheap enough that the speedup is worth it.

Aug 11 '22 18:08 tsandall

@tsandall Thanks for looking the PR over! I'll merge this in soon.

Aug 11 '22 19:08 philipaconrad

opa opa copied to clipboard

builtins: Speed up set unions

Benchmarks

Summary of Results

opa
opa copied to clipboard