REST API policy creation/deletion operations take an extremely long time to process as number of policies increases
Short description
We have OPA running as an independent pod in Kubernetes (no sidecar) with the following config:
OPA version: 1.0.0
Command:
run --server --log-level=debug --config-file=/policies/opa_config.yaml --set=default_decision=/main/decision --addr=0.0.0.0:8181 /policies
OPA config:
services:
app-namel:
url: ${API_URL}/v1
response_header_timeout_seconds: 5
credentials:
bearer:
token: ${OPA_API_KEY}
We then proceed to load our suite of around 4000 policies, with an average of 3 to 5 rules in each policy. During this process we noticed the time it takes for a new policy to be added increases as the number of policies already created grows. While this could be expected, we're reaching an unreasonable processing time fairly quickly. With around 1500 policies created we're experiencing processing times of around 6 seconds for each policy creation (PUT) operation. We've topped the issue at over 12 seconds when all the 4000 policies are loaded.
For the loading process, we've tried both loading policies from disk (with a mounted volume in Kubernetes) and loading via API but the end result is the same. When loading from disk, after all policies are loaded API operations take extremely long. When loading via API, each operation takes increasingly longer as described above the more policies already exist
We've also experienced and unusually high memory usage, which spikes up even further when creating new policies. According to Resource Utilization and considering we're loading around 20k rules, we should be seeing a memory usage of 260MB give or take. Instead, we're sitting at around 500MB which spikes up to 1GB while processing policy creation. We're not loading any external data in OPA (we provide input data at evaluation time along with the policy querying)
Steps To Reproduce
- Deploy OPA service
- Load at least between 3-5k rules into OPA (in our case bundled in around 1.5k policies)
- Attempt to create, update or delete policies via the REST API and monitor how long those API operations work
Expected behavior
API operations, while they might take a bit longer as more policies are loaded, should be kept at reasonable processing times. Memory consumption should not be as high and shouldn't double during policy creation
Thank you for reporting this issue.
A question for clarification:
Which section of the docs are you relying on when calculating your 260MB estimate? The memory usage of a compiled rule can vary greatly depending on the rule's Rego implementation.
Some added context:
When a policy is added/modified/deleted via the REST API, OPA will recompile all existing policies, which I believe is the cause of the incremental computational time you're seeing. This is a known issue. A possible workaround is to not incrementally load policies/modules, and instead load them in bulk, if this fits your setup/requirements.
Memory consumption should not be as high and shouldn't double during policy creation
To guarantee uptime, OPA needs to keep the old compiled policies in memory to keep serving queries while the new policies are being compiled. This has the side effect of OPA's memory footprint ballooning during policy/bundle updates when compared to normal operations.
Hey Johan, thanks for taking the time to look into the issue.
Regarding the 260MB estimate I was looking at this section in the docs, although I understand the rego for each rule can make it vary greatly.
Our biggest issue is the recompilation time and the memory spike it produces. Looking at the implementation we realised what you just confirmed that the compiler needs to recompile all existing policies every time something changes. After everything is compiled again query times are still within ms and memory is kept in check.
The thing with our implementation is that we have a multi-tenant SaaS where users can CRUD policies on their own and independently from one another as each have their own separate tenant. So sending the updates is bulk wouldn't really work as, on one hand we need user-defined policies to start taking effect as soon as they create or update them (as well as stop taking effect when deleted) and on the other we can't batch updates from different tenants. And even if we did group all updates in one request, memory consumption would still be an issue specially as the number of tenants (or of policies in each tenant) grows.
Is there any way of updating policies that wouldn't require a full recompilation? For example if we were to use bundles, and split each tenant's policies into a separate bundle, would that help? Because we're not sure if that approach would in the end need to load and recompile every thing and we would find ourselves with the same result but just with extra steps.
Related https://github.com/open-policy-agent/opa/issues/2282
The compiler could certainly be improved. In the meantime:
The thing with our implementation is that we have a multi-tenant SaaS where users can CRUD policies on their own and independently from one another as each have their own separate tenan
What's the reason for running all tenants on a single OPA? The first thing I'd try would probably be to split that up into a group, and do some kind of routing based on the tenant ID, or what have you.
We're funneling everything into a single OPA pod because we thought it would be able to support the amount of policies we expected to have to manage. We know it's not the best approach, specially because it makes OPA a single point of failure, but it was much simpler to setup.
We've thought about the solution you propose of splitting tenants into different instances of OPA but that introduces a lot of complexity, specially on infra. It raises a bunch of questions. How do we setup tenant-based routing to route the queries to the proper instance? How do we split tenants up? How do we move tenants between instances if an instance dies or a tenant grows so much it has to be taken out of it's current allocation? We were trying to avoid all that complexity, at least for now.
Although you're absolutely right, we have to split tenants somehow because the compiler can't keep up. The current approach we're exploring is setting up a separate pod we're calling "OPA Compiler" whose purpose is to periodically (upon our request when we detect a tenant has updated it's policies) compile all tenant policies into a WebAssembly .wasm file by running the CLI.
Because our backend runs in NodeJS, we will setup a shared Kubernetes volume where the compiler will place the .wasm files in and then the instances that used to query the OPA server will now instead load the .wasm file using the opa-wasm package you guys provide and run the evaluation locally using those binaries. We know that WebAssembly is not as fast as the in-memory policies in a dedicated OPA server but it's still fast enough for our needs and it allows us to easily split tenants up and keep compilation at reasonable times and memory consumption with relative ease, at least for our current use case and setup
Yeah, just to be clear, I don't think you should have to partition your tenants because OPA couldn't keep up! Just that it's probably a good idea anyway, and one that likely alleviates the problem until a proper fix is deployed.
As for the problem itself, I'd love to learn more about the root cause, as it feels to me like there is some issue more specific than "compilation is slow" at play when it takes that much time. After you reported this I had a go at optimizing some code in the hot paths of compilation, and while perf improvements are nice in any case, I was never able to find anything that would be that slow even with thousands of policies compiled. I've only tested compilation in isolation though, so I'm wondering if it could be something we do in the REST API, or somewhere on the way from there to the compiler. I wonder if it's simply a case of every update requiring compilation, and they're simply queing up, eventually having OPA spend all of it's resources on that. Or have you found that with 4000 policies loaded (and loading confirmed to be done), a single update would still take 10+ seconds?
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.