opa icon indicating copy to clipboard operation
opa copied to clipboard

Add Batch Query REST API

Open nevumx opened this issue 1 year ago • 1 comments

What is the underlying problem you're trying to solve?

Currently, if we have several resources or queries we want to authorize via the HTTP REST API without incurring several round-trip network penalties, there is really only one option, which is to pack them in an array that is POSTed to the normal Query REST API with a URL like http://localhost:8181/v1/data/play/batch, like so:

{
    "input": [
        {"principal": "alice", "action": "read_compensation", "entity": "bob"},
        {"principal": "alice", "action": "read_compensation", "entity": "charlie"}
    ]
}

and then process them through a policy like:

package play

import rego.v1

default allow := false

allow if input == {"principal": "alice", "action": "read_compensation", "entity": "charlie"}

batch := [result |
	some query in input
	result := allow with input as query
]

getting back a result like,

{
    "decision_id": "c7703c24-9eba-44e2-ab05-cbd720ec1d19",
    "result": [
        false,
        true
    ]
}

Example playground: https://play.openpolicyagent.org/p/US8tNMq5wa

While this does work at first, when decision logs for such batch queries are uploaded to a service like Styra DAS, they quickly become unreadable if 10 or 100 or more queries are evaluated as a part of a single batch query.

One strategy to attempt to fix this is to just funnel each individual query into its own query with http.send(...), like so:

batch := [result |
	some query in input
	result := http.send({"method": "POST", "url": "http://localhost:8181/v1/data/play/allow", "body": {"input": enriched_query}}).body.result
]

...And then the batch query logs can be filtered out with some log policies. However, as can perhaps be expected, in practice, this has been found to incur a 5x performance penalty, (i.e. a 100ms query would turn into a 500ms query) likely due to unnecessary serialization/deserialization and traversal of the network stack.

Describe the ideal solution

An ideal solution would be a Batch Query REST API where we could POST an HTTP REST request to a special url like http://localhost:8181/v1/batch/data/play/allow in an array of inputs like so:

{
    "inputs": [
        {"principal": "alice", "action": "read_compensation", "entity": "bob"},
        {"principal": "alice", "action": "read_compensation", "entity": "charlie"}
    ]
}

getting back a result like,

{
    "decision_id": "c7703c24-9eba-44e2-ab05-cbd720ec1d1a",
    "results": [
        false,
        true
    ]
}

and in addition, it would produce a decision log for each query in the array that could then be sent to a service like Styra DAS.

Describe a "Good Enough" solution

In a "Good Enough" solution, we would have a built-in function like:

batch := [result |
	some query in input
	result := opa.evaluate(query) # or perhaps opa.evaluate(path, query) where `path` looks like "/v1/data/play/allow"
]

Which would simply evaluate the argument passed to it as if it came from the HTTP REST Query API, but without needing to run all the serialization/deserialization/network code that would otherwise be needed, however it would produce the associated decision log.

Additional Context

N/A

nevumx avatar Mar 11 '24 20:03 nevumx

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

stale[bot] avatar Apr 10 '24 23:04 stale[bot]