opa
opa copied to clipboard
builtins: Speed up set unions
This PR fixes a performance regression for the set union
builtin, discovered in issue #4979.
The original logic for the builtin did pairwise Set.Union
calls between the input sets, resulting in a lot of wasted temporary Sets that were almost immediately discarded, and a lot of duplicated work. The updated builtin inlines the logic from Set.Union
, so that only one pass is made across the incoming sets' members.
In the included benchmarks, this provides a small, but significant improvement over using a pure-Rego comprehension to do the same job.
Fixes #4979
The 3x relevant benchmarks are shown below:
-
union
onmain
branch - Pure Rego implementation (set comprehension)
-
union
on this PR
Benchmarks
Benchmark of union
on main
branch:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkSetUnion$ github.com/open-policy-agent/opa/topdown
goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/topdown
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkSetUnion/10x10-8 2229 544644 ns/op 232683 B/op 3759 allocs/op
BenchmarkSetUnion/10x100-8 265 4402486 ns/op 1754574 B/op 24439 allocs/op
BenchmarkSetUnion/10x250-8 100 11489154 ns/op 4423815 B/op 60298 allocs/op
BenchmarkSetUnion/100x10-8 62 19831002 ns/op 7055828 B/op 32975 allocs/op
BenchmarkSetUnion/100x100-8 5 216948499 ns/op 65802438 B/op 228261 allocs/op
BenchmarkSetUnion/100x250-8 2 639396404 ns/op 159630756 B/op 570903 allocs/op
BenchmarkSetUnion/250x10-8 9 117371750 ns/op 38499626 B/op 86745 allocs/op
BenchmarkSetUnion/250x100-8 1 1469090944 ns/op 358555656 B/op 622447 allocs/op
BenchmarkSetUnion/250x250-8 1 4215347082 ns/op 963369008 B/op 1594872 allocs/op
PASS
ok github.com/open-policy-agent/opa/topdown 16.348s
Benchmark of Pure Rego implementation:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkSetUnionSlow$ github.com/open-policy-agent/opa/topdown
goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/topdown
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkSetUnionSlow/10x10-8 2569 461347 ns/op 194384 B/op 4015 allocs/op
BenchmarkSetUnionSlow/10x100-8 343 4485940 ns/op 1403187 B/op 27487 allocs/op
BenchmarkSetUnionSlow/10x250-8 130 8754293 ns/op 3479220 B/op 68079 allocs/op
BenchmarkSetUnionSlow/100x10-8 243 4330704 ns/op 1632044 B/op 33589 allocs/op
BenchmarkSetUnionSlow/100x100-8 34 33604653 ns/op 12636059 B/op 244572 allocs/op
BenchmarkSetUnionSlow/100x250-8 13 89953012 ns/op 31412245 B/op 610317 allocs/op
BenchmarkSetUnionSlow/250x10-8 100 10220165 ns/op 4017694 B/op 82992 allocs/op
BenchmarkSetUnionSlow/250x100-8 12 101469788 ns/op 31077606 B/op 606872 allocs/op
BenchmarkSetUnionSlow/250x250-8 3 367955598 ns/op 80279797 B/op 1514052 allocs/op
PASS
ok github.com/open-policy-agent/opa/topdown 13.738s
Benchmark of union
on this PR:
PR:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkSetUnion$ github.com/open-policy-agent/opa/topdown
goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/opa/topdown
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkSetUnion/10x10-8 2788 385342 ns/op 174161 B/op 3593 allocs/op
BenchmarkSetUnion/10x100-8 433 2946532 ns/op 1224484 B/op 24095 allocs/op
BenchmarkSetUnion/10x250-8 123 9764934 ns/op 3036327 B/op 59735 allocs/op
BenchmarkSetUnion/100x10-8 313 3304217 ns/op 1442761 B/op 29577 allocs/op
BenchmarkSetUnion/100x100-8 45 27416137 ns/op 10991659 B/op 213286 allocs/op
BenchmarkSetUnion/100x250-8 15 73213722 ns/op 27342127 B/op 533573 allocs/op
BenchmarkSetUnion/250x10-8 138 8663335 ns/op 3553444 B/op 72984 allocs/op
BenchmarkSetUnion/250x100-8 16 71706943 ns/op 26999915 B/op 529107 allocs/op
BenchmarkSetUnion/250x250-8 6 199493126 ns/op 70179568 B/op 1323346 allocs/op
PASS
ok github.com/open-policy-agent/opa/topdown 15.198s
Summary of Results
The benchmarks show that the PR is roughly 15-30% faster and has ~14% fewer allocations than the pure Rego solution. The resolution of these charts are blown out by the 10x10 and 250x250 cases, but still paint a strong picture of the performance changes nonetheless:
I'm going to add some correctness/regression tests, at the suggestion of @charlesdaniels.
The tests will compare the behavior of union(...)
against what the reference should produce. These should be cheap to run, and will help us prevent a silly regression if our existing test suite doesn't already check for this stuff.
I've split the union
tests/benchmarks off into their own files, since I'm planning to soon be doing the same process for the intersection
builtin, and topdown_bench_test
was getting a bit crowded.
@tsandall @srenatus This PR should be ready for review! :smile:
LGTM. In this case, inlining the union logic is cheap enough that the speedup is worth it.
@tsandall Thanks for looking the PR over! I'll merge this in soon.