opa
opa copied to clipboard
Duplicate keys in data structure
Short description
While working with OPA we (@eyalkraft, @uri-weisman) encountered some weird behavior regarding having both a policy and a data file in the same package.
In the bundle, we build we have directories with each having a rego
file, a data.yaml
file, and a rego
test file.
The weird behavior we observe is that while iterating over the keys in the data object, we get duplicate keys for these directories that have both a rego
file and a data.yaml
file.
Steps To Reproduce
Here is an OPA Playground example in order to emphasize the problem
- https://play.openpolicyagent.org/p/1xyU9sDHPE
Policy
package a
\# Here a is duplicated
example1 := [ key | data[key].metadata ]
\# Here a is duplicated as well
example2 := [ {key: value} | value := data[key].metadata ]
\# Here a isn't duplicated since we're building a set and not an array
example3 := { {key: value} | value := data[key].metadata }
Data
{
"a": { "metadata": "interesting" },
"b": { "metadata": "other" }
}
Output
{
"example1": [
"a",
"b",
"a"
],
"example2": [
{
"a": "interesting"
},
{
"b": "other"
},
{
"a": "interesting"
}
],
"example3": [
{
"a": "interesting"
},
{
"b": "other"
}
],
"metadata": "interesting"
}
Expected behavior
Uniqueness in the output keys.
Output
{
"example1": [
"a",
"b"
],
"example2": [
{
"a": "interesting"
},
{
"b": "other"
}
],
"example3": [
{
"a": "interesting"
},
{
"b": "other"
}
],
}
Thanks! Bug or not, whatever is going on here certainly isn't intuitive. Reducing your example some:
a.rego
package a
a.json
{"a":"foo"}
❯ opa eval -f pretty -d a.rego -d a.json 'count(data)'
1
❯ opa eval -f pretty -d a.rego -d a.json 'data'
{
"a": "foo"
}
❯ opa eval -f pretty -d a.rego -d a.json 'data[x]'
+-----+---------+
| x | data[x] |
+-----+---------+
| "a" | "foo" |
| "a" | "foo" |
+-----+---------+
@srenatus can you explain? 😄
Just as an aside, it's normally not a good idea to iterate over data
directly, as it easily leads to recursion, which will halt evaluation. Prefer to iterate over an attribute under data, like data.app.policies[x]
where the iteration is performed from another package, like data.authz
or something like that.
Haven't yet looked closely. Does it also happen if we're iterating a sub-strucutre, like data.foo[x]
?
@srenatus yes, seems so:
❯ opa eval -f pretty -d . data
{
"foo": {
"a": "foo"
}
}
❯ opa eval -f pretty -d . 'data.foo[x]'
+-----+-------------+
| x | data.foo[x] |
+-----+-------------+
| "a" | "foo" |
| "a" | "foo" |
+-----+-------------+
It looks like we're hitting this condition where:
-
doc
contains your data,{"a": {"metadata": "interesting"}, "b": {"metadata": "other" } }
here and -
e.node
, the retrieved rule tree node, also has ana
key here
we then call biunify with the iterator again, effectively adding another item to the array that's built.
Not sure yet what the best approach will be. But I'd predict that we get similar behaviour for arrays and sets if we try hard enough 🤔
Fun fact, wasm does the right thing:
% opa eval -t wasm -fpretty -d scratch/duplicate-keys/p.rego -d scratch/duplicate-keys/d.json data.a.example1
[
"a",
"b"
]
💭 @jeniawhite @eyalkraft @uri-weisman Perhaps you would like to give it a shot? 😃
We'll try and have a look on it soon 👍
Sorry for the late response @srenatus.
I've attempted to build with wasm and encountered some problems.
This is the project: https://github.com/elastic/csp-security-policies.
We have this code main.rego, that cycles across benchmarks
.
If you add a print there for the benchmarks
like this:
findings = f {
not data.activated_rules
# aggregate findings from all benchmarks
f := [finding | var := compliance[benchmarks]
print("DUPLICATE BENCHMARKS", benchmarks)
var.findings[finding]]
}
Build with wasm while located in the project root folder:
opa build bundle/ -t wasm -e bundle/compliance
Create an empty object input.json
file.
When using https://github.com/open-policy-agent/npm-opa-wasm/tree/main/examples/nodejs-app with the policy.wasm
from the bundle.tar.gz
and running the following command:
node app.js '{}'
I'm unable to see anything and my output is an empty array ([]
).
Am I'm missing something?
I've looked at the bundle.tar.gz
and it seemed to be correct.
Running the following evaluation:
opa eval -b bundle.tar.gz data.main -i input.json
You will be able to see in the output the following:
DUPLICATE BENCHMARKS cis_eks
DUPLICATE BENCHMARKS cis_k8s
DUPLICATE BENCHMARKS cis_eks
DUPLICATE BENCHMARKS cis_k8s
DUPLICATE BENCHMARKS lib
{
"result": [
{
"expressions": [
{
"value": {
"findings": [],
"metadata": {
"opa_version": "0.42.0-dev"
}
},
"text": "data.main",
"location": {
"row": 1,
"col": 1
}
}
]
}
]
}
Notice that we have cis_eks
and cis_k8s
duplicated.
We would like to continue and build/consume the bundle as a rego
target instead of wasm
.