opa Allow discovery override of operational decision log plugin configurations

trafficstars

What is the underlying problem you're trying to solve?

The decision_log plugin provides several operational configs under the decision_logs.reporting.* config path. However, as soon as you use discovery, these settings cannot be locally configured. Without setting for example a maximum buffer size, local OPAs can OOM if the log service gets overloaded or applies back pressure.

Some of these settings are required to configure OPA instances adapted to local settings, like maximum buffer sizes, maximum client rates, time windows, etc. As soon as the org is sufficiently large, it seems pretty common that the team writing the discovery documents is different and more central than the team operating the local OPA instances.

We use Styra DAS to serve discovery documents to a fleet of OPA instances. This discovery document at the time of writing is auto-generated and cannot be influenced. Even for non Styra customers I'd argue that a centrally provided discovery document will not cover all local cases.

Describe the ideal solution

Allow to provide local configuration for all decision_logs.reporting.* settings if they are absent from the discovery document or introduce a flag in the discovery configuration that defines the conflict resolution behaviour.

Describe a "Good Enough" solution

Allow to provide local configuration for all decision_logs.reporting.* settings, always override whatever comes via discovery.

Aug 31 '22 06:08 mjungsbluth

So we'd basically make the default config configurable?

The rationale is clear, let's try to fix that.

Aug 31 '22 07:08 srenatus

So we'd basically make the default config configurable?

Yes if default config means what comes in via discovery.

As this can have security implications if you can override a central policy published via discovery, I would be rather restrictive on what can be overridden and start small with the config path mentioned above...

Aug 31 '22 16:08 mjungsbluth

What I meant is that some local config would always augment (override?) the config that comes in via discovery. I think we're talking about the same thing 🤞

Aug 31 '22 17:08 srenatus

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

Sep 30 '22 18:09 stale[bot]

I started drafting a PR and noticed that caching part of the config always silently gets overwritten (which is a bug IMO). I.e. if you configure a local caching config this will not be treated in the same way as the decision_logs config and accepted. If the discovery document has no caching config, the local setting (maximum bytes) will effectively be silently ignored and defaults will be applied.

I can fix this in the same PR to allow local override but it changes existing behaviour (suddenly local config will be effective) which might be mildly surprising to users. wdyt?

Oct 26 '22 21:10 mjungsbluth

The point of the discovery config was to allow admins to centrally control the configs for their OPAs. Allowing a way in OPA so that local overrides discovery would make it difficult to accurately judge the runtime config of a OPA instance. I'm not sure what solution you're proposing but at least we should make it explicit that local is overriding discovery either via some config so that admins have some control over the process. Again not sure how you plan to implement this and if you have a design doc that would be helpful before we actually do this.

Looking at your original problem around applying local config in the config provided via discovery, I wonder if you've looked at setting env variables for example buffer size on individual OPA instances and then using those to dynamically generate that OPA's config. This is would allow the high-level OPA config to be created by the admins and then certain bits can be added dynamically based on the running OPA environment.

Oct 26 '22 21:10 ashutosh-narkar

As I pointed out in the original description, in a sufficiently large organisation there is no such thing as "the admins": One team controls the central parts (discovery distribution, general setup, status and log APIs, ..) with a focus on security, SRE teams independent of that are responsible for operations in all Kubernetes Clusters and that for example OPA instances don't trigger OOM kills, Security Engineers are responsible for yet other things.

So you're suggesting to use Env variables in a central discovery document, expecting local OPA deployments to set that? I always thought that env substitution only works on local configs and only when using the OPA runtime (we use it as Go library). Apart from the point that we cannot do that with Styra DAS, it feels like very tight coupling esp. if multiple teams are involved and everything needs to fit together to work.

The only thing we'd like to make overridable locally are operational attributes that specify buffer sizes and timeouts (very much like you configure services and their authentication locally). What I started drafting is that plugins can essentially allow a specific set of config attributes to be locally reconfigured: In the decision_logs plugin that is the runtime block, for the caching setting that is the maximum cache size.

That is also when I discovered during testing that the caching settings get silently overwritten (different behaviour to a local decision_logs config for example)

Oct 27 '22 06:10 mjungsbluth

So you're suggesting to use Env variables in a central discovery document, expecting local OPA deployments to set that?

Discovery allows you to generate the OPA config using policy. So you could use the opa.runtime builtin to set the appropriate local settings.

Oct 27 '22 17:10 ashutosh-narkar

So I myself keep forgetting that ☝️ this is a thing -- discovery can be a policy. ✨

So that policy would allow the OPA instance to determine the proper bytes maxima based on knowledge of the instance using the logic from the central control plane. We'd start OPA like

MEM=$(ops memory-available) opa run --server [...]

and use opa.runtime().env.MEM in the policy to set the limits. ops memory-available is a command I've just made up. This could be hardcoded, or come from some k8s annotation or something, I guess 🤔

Oct 27 '22 17:10 srenatus

Hah, I also keep forgetting that. I guess that would work if we could change the discovery policy in Styra DAS, which we cannot at the moment (it is implicitly generated). I also checked that the env variables seem to be not specific to running OPA standalone but should also work as Go library.

The only drawback I can see is replicating the configuration structure in environment variables or labels. During an incident that requires changing one of those operational attributes you'd probably need to involve multiple teams if the property has not been preemptively designed in the discovery policy. On the other hand, it should not be more than a handful.

Now it got me thinking if we cannot wrap a discovery policy around the Styra generated discovery data.json and just point to another discovery.decision 🤔.

Anyways, let me reach out to our CSM and sort this out before we continue here...

Oct 27 '22 19:10 mjungsbluth

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

Nov 26 '22 21:11 stale[bot]

Hi @mjungsbluth 👋 I'm curious to learn what the outcome was here 🙂 Did the proposed solution work for you?

Feb 03 '23 07:02 anderseknert

Hi @anderseknert! Sadly the situation is unchanged as we cannot influence the discovery in Styra DAS. I am waiting on some update over there on which path to take.

Feb 03 '23 11:02 mjungsbluth

Ah, thanks for the update, Magnus! 👍

Feb 03 '23 11:02 anderseknert

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

Mar 05 '23 14:03 stale[bot]

Closing this in favor of https://github.com/open-policy-agent/opa/issues/5722

Mar 08 '23 01:03 ashutosh-narkar

opa opa copied to clipboard

Allow discovery override of operational decision log plugin configurations

What is the underlying problem you're trying to solve?

Describe the ideal solution

Describe a "Good Enough" solution

opa
opa copied to clipboard