opa icon indicating copy to clipboard operation
opa copied to clipboard

Duplicate keys in data structure

Open jeniawhite opened this issue 2 years ago • 8 comments

Short description

While working with OPA we (@eyalkraft, @uri-weisman) encountered some weird behavior regarding having both a policy and a data file in the same package. In the bundle, we build we have directories with each having a rego file, a data.yaml file, and a rego test file. The weird behavior we observe is that while iterating over the keys in the data object, we get duplicate keys for these directories that have both a rego file and a data.yaml file.

Steps To Reproduce

Here is an OPA Playground example in order to emphasize the problem

  • https://play.openpolicyagent.org/p/1xyU9sDHPE

Policy

package a

\# Here a is duplicated
example1 := [ key | data[key].metadata ]

\# Here a is duplicated as well
example2 := [ {key: value} | value := data[key].metadata ]

\# Here a isn't duplicated since we're building a set and not an array 
example3 := { {key: value} | value := data[key].metadata }

Data

{
    "a": { "metadata": "interesting" },
    "b": { "metadata": "other" }
}

Output

{
    "example1": [
        "a",
        "b",
        "a"
    ],
    "example2": [
        {
            "a": "interesting"
        },
        {
            "b": "other"
        },
        {
            "a": "interesting"
        }
    ],
    "example3": [
        {
            "a": "interesting"
        },
        {
            "b": "other"
        }
    ],
    "metadata": "interesting"
}

Expected behavior

Uniqueness in the output keys.

Output

{
    "example1": [
        "a",
        "b"
    ],
    "example2": [
        {
            "a": "interesting"
        },
        {
            "b": "other"
        }
    ],
    "example3": [
        {
            "a": "interesting"
        },
        {
            "b": "other"
        }
    ],
}

jeniawhite avatar Jun 16 '22 14:06 jeniawhite

Thanks! Bug or not, whatever is going on here certainly isn't intuitive. Reducing your example some:

a.rego

package a

a.json

{"a":"foo"}
❯ opa eval -f pretty -d a.rego -d a.json 'count(data)'
1
❯ opa eval -f pretty -d a.rego -d a.json 'data'
{
  "a": "foo"
}
❯ opa eval -f pretty -d a.rego -d a.json 'data[x]'
+-----+---------+
|  x  | data[x] |
+-----+---------+
| "a" | "foo"   |
| "a" | "foo"   |
+-----+---------+

@srenatus can you explain? 😄

Just as an aside, it's normally not a good idea to iterate over data directly, as it easily leads to recursion, which will halt evaluation. Prefer to iterate over an attribute under data, like data.app.policies[x] where the iteration is performed from another package, like data.authz or something like that.

anderseknert avatar Jun 17 '22 09:06 anderseknert

Haven't yet looked closely. Does it also happen if we're iterating a sub-strucutre, like data.foo[x]?

srenatus avatar Jun 17 '22 10:06 srenatus

@srenatus yes, seems so:

❯ opa eval -f pretty -d . data
{
  "foo": {
    "a": "foo"
  }
}
❯ opa eval -f pretty -d . 'data.foo[x]'
+-----+-------------+
|  x  | data.foo[x] |
+-----+-------------+
| "a" | "foo"       |
| "a" | "foo"       |
+-----+-------------+

anderseknert avatar Jun 19 '22 13:06 anderseknert

It looks like we're hitting this condition where:

  1. doc contains your data, {"a": {"metadata": "interesting"}, "b": {"metadata": "other" } } here and
  2. e.node, the retrieved rule tree node, also has an a key here

we then call biunify with the iterator again, effectively adding another item to the array that's built.

Not sure yet what the best approach will be. But I'd predict that we get similar behaviour for arrays and sets if we try hard enough 🤔

srenatus avatar Jun 20 '22 06:06 srenatus

Fun fact, wasm does the right thing:

% opa eval -t wasm -fpretty -d scratch/duplicate-keys/p.rego -d scratch/duplicate-keys/d.json data.a.example1
[
  "a",
  "b"
]

srenatus avatar Jun 20 '22 09:06 srenatus

💭 @jeniawhite @eyalkraft @uri-weisman Perhaps you would like to give it a shot? 😃

srenatus avatar Jun 22 '22 12:06 srenatus

We'll try and have a look on it soon 👍

eyalkraft avatar Jun 22 '22 14:06 eyalkraft

Sorry for the late response @srenatus.

I've attempted to build with wasm and encountered some problems.

This is the project: https://github.com/elastic/csp-security-policies.

We have this code main.rego, that cycles across benchmarks.

If you add a print there for the benchmarks like this:

findings = f {
	not data.activated_rules

	# aggregate findings from all benchmarks
	f := [finding | var := compliance[benchmarks]
	print("DUPLICATE BENCHMARKS", benchmarks)
	var.findings[finding]]
}

Build with wasm while located in the project root folder: opa build bundle/ -t wasm -e bundle/compliance

Create an empty object input.json file.

When using https://github.com/open-policy-agent/npm-opa-wasm/tree/main/examples/nodejs-app with the policy.wasm from the bundle.tar.gz and running the following command: node app.js '{}'

I'm unable to see anything and my output is an empty array ([]).

Am I'm missing something?

I've looked at the bundle.tar.gz and it seemed to be correct.

Running the following evaluation: opa eval -b bundle.tar.gz data.main -i input.json

You will be able to see in the output the following:

DUPLICATE BENCHMARKS cis_eks
DUPLICATE BENCHMARKS cis_k8s
DUPLICATE BENCHMARKS cis_eks
DUPLICATE BENCHMARKS cis_k8s
DUPLICATE BENCHMARKS lib
{
  "result": [
    {
      "expressions": [
        {
          "value": {
            "findings": [],
            "metadata": {
              "opa_version": "0.42.0-dev"
            }
          },
          "text": "data.main",
          "location": {
            "row": 1,
            "col": 1
          }
        }
      ]
    }
  ]
}

Notice that we have cis_eks and cis_k8s duplicated.

We would like to continue and build/consume the bundle as a rego target instead of wasm.

jeniawhite avatar Jul 07 '22 10:07 jeniawhite