opa icon indicating copy to clipboard operation
opa copied to clipboard

Add generic reduce construct

Open anderseknert opened this issue 1 year ago • 10 comments

In some cases, it would be useful if Rego allowed for limited/local mutation of a variable, similar to reduce from functional languages. Rego is not a functional language, but we could either allow limited use of first-class functions (1½-class functions?) or try and figure out a construct that feels more native in our context. Very much open to ideas here!

Some ideas to get the discussion started :)

"Comprehension style"

sum := reduce([acc | some x in [1, 2, 3]; acc := acc + x])

# or

sum := reduce([acc, x | acc := acc + x, 0, [1, 2, 3]])

"Functional style"

sum := reduce(plus, [1, 2, 3])

"Every style"

sum := reduce acc, x in [1, 2, 3] {
    acc := acc + x
}

We currently have some built-in functions taking either a single value, or a _n "version" taking a collection which does something like a reduce behind the scenes (i.e. not in Rego). Providing a generic construct would avoid having to add more whenever there's a request for that. It would also simplify solving certain class of problems, that may currently require some quite arduous workarounds.

anderseknert avatar Mar 21 '23 09:03 anderseknert

Huh, it seems like "functional style" is the least flexible. For the others, I could imagine restricting the iteration, like

xs := [{"a": 1}, {"b": 2}]

sum := reduce([acc, x | some {"a": x} in xs; acc := plus(acc, x)]) # comprehension style

sum := reduce acc, {"a": x} in [1, 2, 3] { # "every" style
    acc := plus(acc, x)
}

The functional style would -- syntactically -- allow for no such thing.

srenatus avatar Mar 21 '23 09:03 srenatus

Controversial take: Let's have first class functions, and get map / reduce / fold and friends for free. :smiling_imp:

My reasoning for this charged opinion:

  • First-class functions open up a wealth of abstraction/composition opportunities for policy writers, and would dramatically reduce the interest/need for code generators to help with large, repetitive policies.
    • Potential for code reuse might be enough justification on its own for first-class function support. :thinking:
  • The "guaranteed termination / turing incompleteness" property of Rego can still be assured, even if we allow functions as first-class language constructs (allow the evaluator to record recursion depth, and bail out at a preset max depth, like Python does).

What are some obvious problems with my idea?

  • Our termination checks in the compiler would need to be completely rewritten.
  • The evaluator and low-level Term types would need overhauling to allow for first-class function support.
  • The language itself becomes more complex / type checking gets uglier.

Why not one of the other suggested styles?

  • "Comprehension style": Comprehensions aren't allowed to be used anywhere else in the language like that; we'd need to overhaul how they're documented / processed in the evaluator to allow that.
  • "Functional style": We don't have first-class function support. (Yet!) Without first-class functions, this adds a weird "special case" to the language.
  • "Every style": Introducing a new keyword might be one of the more practical ways to support this sort of technique in the short term, but it's a risky syntactic sugar to add, and will still have non-trivial implementation difficulty.

philipaconrad avatar Mar 21 '23 13:03 philipaconrad

I think going fully functional would be a mistake, and I say that as someone with several Clojure projects in my GitHub account :)

We’d essentially introduce an entirely new paradigm to Rego. It’d be familiar to some of course, but worst case it would require new users to understand both the declarative / datalog approach, and the functional one.

I could see it being used for the single purpose of reduce though, but I'm leaning more towards either comprehension style or the every lookalike to be closer to Rego as we know it.

anderseknert avatar Mar 21 '23 13:03 anderseknert

Another option that occurs to me would be to simply use an off the shelf expression language, such as CEL, though there are others.

Then reduce would look something like reduce([1,2,3], "acc + cur") -> 6.

This would have several advantageous properties. For eone, it would allow us to punt on functions-as-values, and on direct variable mutation (since the acc special variable would be managed by the reduce builtin itself). There would be no need for any new syntax nor changes to the evaluator.

Of course the downside is it means that now our DSL (Rego) has another, different DSL inside of it, whichever one we pick.

charlesdaniels avatar Mar 21 '23 14:03 charlesdaniels

Comprehensions aren't allowed to be used anywhere else in the language like that

I don't think we need to redefine how comprehensions work. It should just be enough to define a new type of comprehension alongside those for arrays, sets, and objects: sort of a "value" comprehension that doesn't produce a collection, but rather some arbitrary value using the iterative procedure of a comprehension.

E.g. consider

<x, y := .. | .. >

where the lhs defines two variables: the first is the value to be assigned; and the second is an optional ref to the value (with an optional assigned first value) produced by the previous iteration of the rhs. The value generated by the comprehension would simply be whatever value was last assigned to the first arg (x).

To accomplish the effects of a reduce, one would do:

a := [1, 2, 3, 4]
sum := <x, y := 0 | x := y + a[_]>

There are probably pitfalls with this approach too 😅, but on first light, it doesn't feel like it wreaks havoc with Rego.

johanfylling avatar Mar 21 '23 21:03 johanfylling

Since , isn't allowed on the LHS of a comprehension, what if we simply implemented @johanfylling 's suggestion with the existing []/{} syntax? It would be unambiguous and wouldn't break backwards compat I don't think.

charlesdaniels avatar Mar 21 '23 21:03 charlesdaniels

My reasoning behind introducing <> is to make it possible to define a comprehension that doesn't return a collection. Of course, we could also add this behavior to e.g. the array comprehension, have it behave the same as described above, and then just pluck out the last element manually. But that would require you to construct the entire intermediate collection, which'd be unnecessarily memory consuming.

johanfylling avatar Mar 21 '23 21:03 johanfylling

I'm not opposed to <> either. I think there's a good argument to be made for it since it might be too confusing to have , change the shape of data produced by a comprehension.

Certainly, I have often had to do array comprehensions that only produce one element and then subscript that element out, so I think there's a use case for comprehensions that don't return collections.

I wonder if it might make sense to instead define a new unary operator that means "give me the last element of this collection". If we combined that with , on the LHS of the existing collections, the runtime could detect when the operator is applied to a comprehension and then do the more memory optimal thing.

For the sake of discussion, let's call this operator $, since that means end of line in regex. That would give us something like:

$[1,2,3] # -> 3
${1,2,3} # undefined (or should it be an arbitrary, but deterministic member of the set?
$[1] # -> 1
${1} # -> 1 (this case is unambiguous - "last" is well defined if there is only 1 element)
$[y | x := [1,2,3][_]; y := x*2] # -> 6
$[x, y := 0 | x := y + [1,2,3,4][_]] # -> 10

I kinda like it, and I think the $ operator could be useful for working with lists/sets read from input too.

Edit: I think it also follows that if we have a "last" operator, we should have a "first" operator to complement it. For the sake of discussion, call this ^.

charlesdaniels avatar Mar 21 '23 21:03 charlesdaniels

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

stale[bot] avatar Apr 21 '23 01:04 stale[bot]

After a fair bit of thought, I've come around to liking @anderseknert's original proposal for the "Every style" syntax, and also @johanfylling's "reduction comprehension" syntax proposal.

Each of those two approaches carries the syntactic advantage of not changing the rules about where we allow expressions to appear in the Rego grammar overall. I think reduction comprehensions will prove to be strictly more powerful, but the "every style" syntax block might be easier to teach to newcomers. Either way, we'll need to spend some time on good docs for this powerful new language feature. :slightly_smiling_face:

philipaconrad avatar Oct 23 '23 16:10 philipaconrad