gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

Add Constraint-Level Audit Override Settings

Open jimmyraywv opened this issue 2 years ago • 4 comments

Describe the solution you'd like I would like to be able to control the audit functionality at the the constraint level. Currently, the settings described here are system-wide settings. In testing with the new External Data Provider feature, I realized that it would be very nice to be able to override the gatekeeper system-level audit settings by surfacing constraint elements settings, such as:

--constraint-violations-limit=123
--audit-from-cache=true
--audit-interval=123 (most desired)

Anything else you would like to add: N/A

Environment:

  • Gatekeeper version: v3.8.1
  • Kubernetes version: (use kubectl version):
Client Version: v1.23.5
Server Version: v1.22.6

jimmyraywv avatar May 18 '22 18:05 jimmyraywv

What's your use case here? Are you experiencing performance issues with certain Constraints that trigger on many objects in your cluster? (e.g. does your cluster slow down/experience excessive CPU usage whenever audit is triggered)

willbeason avatar May 25 '22 17:05 willbeason

@willbeason The initial use case is that the audit activity was triggering the policy to call the external data provider every 60s. The external data provider was then making calls outside of the cluster. While we can certainly add caching to the service called by the external data provider, the situation made me think that maybe not all policies are of the same criticality or priority for auditing purposes. It just made sense to to me that constraints could be used to provide that policy audit granularity. Without the properties provided in the constraint, the policy would use default audit settings. With constraint-level settings, policy audit tiers, with different audit intervals, would be possible.

jimmyraywv avatar May 25 '22 17:05 jimmyraywv

Thanks for responding! That makes sense - I'll bring that up in our meetings.

willbeason avatar May 25 '22 18:05 willbeason

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 24 '22 21:07 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 11 '22 02:10 stale[bot]

Can we get this considered still? It is similar to Issue #2266.

jimmyraywv avatar Nov 08 '22 00:11 jimmyraywv

Do I need to open a new issue?

jimmyraywv avatar Nov 08 '22 00:11 jimmyraywv

Un-marking as stale.

#2266 is a bit different in that it is asking for a flag as to whether a constraint should be audited at all.

Having different audit cycles and configurations for different constraints is a heavier lift to the point where, as-requested, I'm not sure it's possible.

One thing that could be interesting would be to have a way to partition different constraints to be evaluated by different pods, as a way of load balancing. Then, it would be possible to run multiple audit pods with different high-level configurations and it could get the same top-level behavior.

I'd want to get a bit more signal as to amount of use cases and frequency of the need in order to know how to prioritize this.

maxsmythe avatar Nov 08 '22 00:11 maxsmythe

WRT external data providers, I would cache at the provider layer as they are best-positioned to know what sort of caching-invalidation model makes sense and how it should be configured.

maxsmythe avatar Nov 08 '22 00:11 maxsmythe

Yep, we are looking into that. Currently, is disabling the gatekeeper-audit the only way to stop auditing?

jimmyraywv avatar Nov 08 '22 00:11 jimmyraywv

For a specific constraint? I believe so, though I'd imagine webhook evaluation to be higher-traffic than audit.

Would the ability to disable a specific constraint for audit (if not the finer grained tuning you're asking for), be a useful short-term mitigation for you?

maxsmythe avatar Nov 08 '22 01:11 maxsmythe

So, I was thinking about a CRD change like so:

spec:
  crd:
    spec:
      names:
        kind: RatifyVerification
      audit: false
      validation:
        legacySchema: true

Then change the getAllConstraintKinds() function to look for that flag, and not add the GVK to the unique map if audit=false.

I guess you could also hack at it and use a no-audit suffix on a shortName and then look for that suffix in the getAllConstraintKinds ShortNames() array.

jimmyraywv avatar Nov 08 '22 01:11 jimmyraywv

I think this might help with reducing calls to external data providers in audit https://github.com/open-policy-agent/gatekeeper/issues/2386

ritazh avatar Nov 08 '22 01:11 ritazh

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 07 '23 05:01 stale[bot]