Allow policy to perform transforms on data updates in store
It's quite common that Rego policies do some transformations on the data it needs as part of a policy decision. This could for example be done to narrow down a huge list of users or resources to only those carrying certain attributes of interest for certain policy decisions — or it could be concatenating multiple attributes into a string in order to present something nice to the querying client.
Sometimes these operations depend on the input, but it's certainly not unusual to see transformations applied with every policy decision even when those transforms could have been computed just once. And to be clear, this is normally not an issue. For small/medium datasets, the cost of e.g. filtering some attributes into an intermediate set or object before using that for the actual "business logic" will most likely have a negligible impact on performance. There are however cases where either the data set is large enough ·— or the transformations performed are costly enough — that the cost of doing these at evaluation time will impact performance. Another benefit is that some policies likely could be made more readable / easy to follow, should the transformations they currently do be moved out.
I suggest that we add a simple system to allow Rego policy under a reserved path (like system.store) to add something similar to materialized views from the database world — queries that are evaluated when the store is updated, and which perform desired transforms whose result is persisted to the store, and made accessible during normal policy evaluation along with any other data.
Now, there are many ways this could be done, and one way does not exclude another. But to start out simple, I'm suggesting these queries run after data is written to the store, and thus allowed to query any current state under data as any normal policy would. Trivial example:
package system.store
developers contains user if {
some user in data.users
"developer" in user.roles
}
Every time the store is updated, existing system.store policies would be evaluated, and any paths under system.store resolved as part of this (like developer above) would be added to the store and made available to other policies. I'm intentionally leaving the details of the design open to discussion, but while I'm aware there are many cool and advanced things that can be done in this space, I think a simple solution that's easy to reason about and for anyone to be able to reach for when performance is a concen... should be our goal for a first implementation.
I think this solves a real issue people have, nice! Some thoughts:
- I had wondered if there was a solution to this that allowed users to opt into having data be discarded by policy too. Perhaps this is another issue with a different solution (e.g. a build time one).
- I guess these store policies be run during bundle activation so
data.usersand views dependent on it would be consistent? Or could they only be evaluated when first accessed if a user preferred that.
So the alternative we've discussed internally before is a sort of lazy caching mechanism. Like topdown's virtual cache, but across multiple evals. It would be a transparent optimization for everyone, without special requirements.
Basically, whenever
- a (multi-value or single-value) rule is fully evaluated, and
- the evaluation path required neither
inputdereferences nor calls to non-deterministic builtins, then we'd store the result for that ref.
This cache would be invalidated whenever a policy or data update happens.
There are some tie-ins with the rule indexer, I think: You cannot cache a value if you've only run ia subset of the rule bodies -- but when the RI actually does its magic, caching is off the table anyhow, as by definition there's some input use.
The advantage is that it's transparent: you write your rules, and you build your virtual documents as you need them, and you can evaluate those rego policies everywhere, yielding the same result. Only evaluation within OPA (vanilla) would have these optimizations on top. There would be little requirements for user education, for example.
My personal take is that I'd love to see (or do) :point_up: but I'm also aware of v1/topdown/eval.go being a bit of a special place already, that's not for everyone. So what I like about @anderseknert's proposal is that's somewhat orthogonal. (It could even be implemented today, out-of-tree, by a storage wrapper, I think?)
I'm mostly on board with that. My only objection is the lazy aspect.
This cache would be invalidated whenever a policy or data update happens.
This should also rebuild the cache! Since OPA already would know which rules are "data transformers" and they don't depend on anything the client provides, why have their requests trigger evaluation of potentially costly transformations like building indexes? Having that done outside / independently of request -> decision processing seems like a better approach to me. Am I missing something?
(Also, and more of an implementation detail, but we probably don't need to invalidate / rebuild the cache entirely on every update, but only data produced by rules with a dependency to the paths impacted by an update. Right?)
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.