opa
opa copied to clipboard
union() is slower than pure Rego, can be up to O(n^3)
union()
is very slow in Rego because of how it is implemented currently.
Consider the implementation of the union
builtin:
// builtinSetUnion returns the union of the given input sets
func builtinSetUnion(a ast.Value) (ast.Value, error) {
inputSet, err := builtins.SetOperand(a, 1)
if err != nil {
return nil, err
}
result := ast.NewSet()
err = inputSet.Iter(func(x *ast.Term) error {
n, err := builtins.SetOperand(x.Value, 1)
if err != nil {
return err
}
result = result.Union(n)
return nil
})
return result, err
}
And also ast.set.Union()
:
// Union returns the set containing all elements of s and other.
func (s *set) Union(other Set) Set {
r := NewSet()
s.Foreach(func(x *Term) {
r.Add(x)
})
other.Foreach(func(x *Term) {
r.Add(x)
})
return r
}
If we have $n$ many sets to union, of size $k$ each, then that's going to result in $n$ many calls to ast.set.Union()
, except that each call to ast.set.Union()
is going to end up creating a whole new set and copying all of the existing elements over to it.
This means that each call to ast.set.Union()
takes increasingly more time, specifically $O(k), O(2k), O(3k), \dots$.
This means the running time is O(k * nth triangle number) = $O\left( \frac{(n+1)!}{2!(n-2)!} \cdot k\right) = O\left(\frac{1}{2} k n (n^2-1)\right) \approx O(kn^3)$.
Empirical Benchmarks
With the union()
builtin:
$ cat slowunion/slowunion.rego
package slowunion
SIZE := 250
NSETS := 250
nums := numbers.range(0, NSETS)
sizes := numbers.range(0, SIZE)
sets[n] = x {
nums[n]
x := {sprintf("%d,%d", [n, i]) | sizes[i]}
}
combined := union({s | s := sets[_]})
$ time opa eval --profile --format pretty --bundle ./slowunion/ --import "data.slowunion" 'count(data.slowunion.combined)'
63001
+------------------------------+------------+
| METRIC | VALUE |
+------------------------------+------------+
| timer_rego_load_bundles_ns | 583584 |
| timer_rego_module_compile_ns | 730417 |
| timer_rego_module_parse_ns | 162458 |
| timer_rego_query_compile_ns | 52125 |
| timer_rego_query_eval_ns | 2673904333 |
| timer_rego_query_parse_ns | 29625 |
+------------------------------+------------+
+--------------+----------+----------+-----------------------------+
| TIME | NUM EVAL | NUM REDO | LOCATION |
+--------------+----------+----------+-----------------------------+
| 2.543266402s | 4 | 254 | slowunion/slowunion.rego:14 |
| 115.799818ms | 63503 | 126253 | slowunion/slowunion.rego:11 |
| 3.801167ms | 2 | 2 | data.slowunion.combined |
| 152.332µs | 3 | 3 | slowunion/slowunion.rego:6 |
| 119.959µs | 1 | 251 | slowunion/slowunion.rego:10 |
| 95.999µs | 3 | 3 | slowunion/slowunion.rego:7 |
| 4.624µs | 1 | 1 | slowunion/slowunion.rego:4 |
| 2.125µs | 1 | 1 | slowunion/slowunion.rego:3 |
+--------------+----------+----------+-----------------------------+
real 0m2.699s
user 0m3.573s
sys 0m0.137s
With pure Rego instead:
$ cat slowunion/slowunion.rego
package slowunion
SIZE := 250
NSETS := 250
nums := numbers.range(0, NSETS)
sizes := numbers.range(0, SIZE)
sets[n] = x {
nums[n]
x := {sprintf("%d,%d", [n, i]) | sizes[i]}
}
#combined := union({s | s := sets[_]})
combined := {t | s := sets[_]; s[t]}
$ time opa eval --profile --format pretty --bundle ./slowunion/ --import "data.slowunion" 'count(data.slowunion.combined)'
63001
+------------------------------+-----------+
| METRIC | VALUE |
+------------------------------+-----------+
| timer_rego_load_bundles_ns | 538083 |
| timer_rego_module_compile_ns | 725375 |
| timer_rego_module_parse_ns | 176458 |
| timer_rego_query_compile_ns | 49208 |
| timer_rego_query_eval_ns | 327751125 |
| timer_rego_query_parse_ns | 31375 |
+------------------------------+-----------+
+--------------+----------+----------+-----------------------------+
| TIME | NUM EVAL | NUM REDO | LOCATION |
+--------------+----------+----------+-----------------------------+
| 195.567117ms | 254 | 63254 | slowunion/slowunion.rego:15 |
| 114.836799ms | 63503 | 126253 | slowunion/slowunion.rego:11 |
| 2.148999ms | 2 | 2 | data.slowunion.combined |
| 160.375µs | 3 | 3 | slowunion/slowunion.rego:6 |
| 117.471µs | 1 | 251 | slowunion/slowunion.rego:10 |
| 110.126µs | 3 | 3 | slowunion/slowunion.rego:7 |
| 10.416µs | 1 | 1 | slowunion/slowunion.rego:4 |
| 2.084µs | 1 | 1 | slowunion/slowunion.rego:3 |
+--------------+----------+----------+-----------------------------+
real 0m0.354s
user 0m0.426s
sys 0m0.029s
System info:
$ uname -a
Darwin styra-macbook.local 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 arm64
$ system_profiler SPSoftwareDataType SPHardwareDataType
Software:
System Software Overview:
System Version: macOS 12.4 (21F79)
Kernel Version: Darwin 21.5.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Computer Name: styra-macbook
User Name: Charles Daniels (cad)
Secure Virtual Memory: Enabled
System Integrity Protection: Enabled
Time since boot: 5 days 15:17
Hardware:
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro18,3
Chip: Apple M1 Pro
Total Number of Cores: 8 (6 performance and 2 efficiency)
Memory: 32 GB
System Firmware Version: 7459.121.3
OS Loader Version: 7459.121.3
Serial Number (system): REDACTED
Hardware UUID: REDACTED
Provisioning UDID: REDACTED
Activation Lock Status: Disabled
$ opa version
Version: 0.43.0
Build Commit: d75bbdd
Build Timestamp: 2022-07-29T21:15:42Z
Build Hostname: Mac-1659129125033.local
Go Version: go1.18.4
Platform: darwin/arm64
WebAssembly: unavailable
@charlesdaniels Thank you for reporting this, and for the detailed analysis/example code. :smile:
I think for now the easiest way to boost performance for this builtin would be to inline a chunk of the set unioning logic into the builtin's definition. That could get rid of a whole level of wasted allocations, since we'd not be throwing away temporary sets along the way.