opa icon indicating copy to clipboard operation
opa copied to clipboard

Modifying large input or data is slow

Open matt-phylum opened this issue 2 years ago • 0 comments

What is the underlying problem you're trying to solve?

As a workaround for #5801, I have a large input document that I am modifying using with to parameterize incremental rules.

However, performance is very slow. It looks like when changing any values stored in data or input, a deep clone of the entire data or input namespace is performed. If you are evaluating a rule for every item in the input collection, you end up with an n^2 number of copies as you add elements to the list because the entire set of elements is copied for every element the rule is being invoked for.

Describe the ideal solution

Using with to mutate data or input should not require deep clones. The new state should either be a copy-on-write shallow clone of the previous state or the new state should be an overlay over the previous state.

Describe a "Good Enough" solution

As a workaround, because data and input are separate, it's possible to write values into whichever is smaller, but this trick only works once. If your policy requires modifying a value before calling a rule and both data and input already contain a large amount of data you can't just discard, you're out of luck.

If the policy author were able to create arbitrary top-level names, or if data[_] and input[_] were special such that with data.a requires no deep cloning and with data.a.b requires a deep clone of only data.a, that would probably be enough.

Additional Context

# example.rego
package example
import future.keywords

root_input contains "ok" if {
    some element in input.collection
    item_input with input.element as element
}

item_input contains "ok" if {
    some value in input.element
    value == 1
}

root_data contains "ok" if {
    some element in input.collection
    item_data with data.element as element
}

item_data contains "ok" if {
    some value in data.element
    value == 1
}
package example
import future.keywords

collection := [ { "id": i, "contents": [ j | some j in numbers.range(0, 10) ] } | some i in numbers.range(0, 10000) ]

test_root_input {
    # times out
    root_input with input.collection as collection
}

test_root_data {
    root_data with input.collection as collection
}

matt-phylum avatar Mar 30 '23 18:03 matt-phylum