opa
opa copied to clipboard
nd_builtin_cache format is problematic
The format of the nd_builtin_cache field in decision_logs makes it difficult and computationally expensive to mask sensitive values:
-----------8<------------
"nd_builtin_cache": {
"http.send": {
"[{\"method\":\"get\",\"url\":\"http://example.com\"}]": {
"body": null,
-----------8<------------
The cache currently represents each call as a key-value pair under the built-in name. The arguments to the call are stored in the key and the response is stored in the value. There are two problems with this format.
- If we ever wanted to include other information about the call (e.g., metrics) we would not have a place to put it.
- More importantly, if there are any sensitive values in the args (e.g., the HTTP authorization header) masking them requires relying on regexp, rebuilding the entire map, etc.
I think we should find a different format. A proposal:
nd_builtin_cache: {
calls: {
http.send: [
{
request: [
{method: get, url: "...", headers: {authorization: "..."}}
]
response: ...
}
]
}
}
Each builtin call would be keyed by the function name and then the value would be the unique set of calls made. Masking out values could look like this:
# mask out request authorization header and all response headers
suffixes_to_mask := ["request/0/headers/authorization", "response/headers"]
mask contains op if {
some idx
input.nd_builtin_cache.calls["http.send"][idx]
some suffix in suffixes_to_mask
op := {"op": "remove", "path": sprintf("/nd_builtin_cache/calls/http.send/%v/%v", [x, suffix])}
}
This format would give us room to add additional fields in the future if needed. Since folks are already using the nd_builtin_cache feature and masking values in it, I think we should deprecate the current field (and configuration) and introduce a new one.
I agree in general, but I'd propose we use more generic names:
http.send: [
{
args: [
{method: get, url: "...", headers: {authorization: "..."}}
]
result: ...
}
]
Let's do request
-> args
and response
-> result
. Since there are other non-deterministic builtins (e.g. rand.intn), for which request/response isn't a good description.