gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

controller should have leader election option when not running webhooks

Open jdef opened this issue 3 years ago • 21 comments

Describe the solution you'd like I'd like to deploy e.g. the "audit" component statically, not using k8s deployments, in HA. It would be very convenient to leverage the controller-runtime's leader election capability for such. Enabling this via CLI flag would be ideal.

Anything else you would like to add: The program should probably crash if leader election AND webhooks were both enabled since webhooks should not depend on apiserver's leader election in a HA setting.

Environment:

  • Gatekeeper version: 3.7.0
  • Kubernetes version: (use kubectl version): 1.21.7

jdef avatar Jan 17 '22 03:01 jdef

@shomron I think we discussed this at some point. Do you know if that discussion is documented anywhere?

Definitely came up in a community meeting once.

maxsmythe avatar Jan 19 '22 02:01 maxsmythe

I think this was discussed https://github.com/open-policy-agent/gatekeeper/discussions/1451#discussioncomment-1062583 (as noted in community call notes) however it seems it was deleted.

ritazh avatar Jan 19 '22 07:01 ritazh

Argh, migrated to the discussions project maybe?

maxsmythe avatar Jan 20 '22 01:01 maxsmythe

We turned off discussions because of opa/feedback migration, however existing discussions couldn't be transferred. I took a screenshot of it now: discussion

sozercan avatar Jan 20 '22 23:01 sozercan

Thanks for digging up that thread. I've already added a couple leader-election flags to my GK fork and will be testing soon. At some point, I'd like to nix the fork though.

jdef avatar Jan 21 '22 13:01 jdef

Depending on the complexity/behavioral cost, I see no reason not to add it in principle. Curious to see what the PR would look like. I think @shomron was the most skeptical of this.

maxsmythe avatar Jan 22 '22 01:01 maxsmythe

suggestions for a PR:

  • two flags (in main.go)
    • leader-election (bool, true to enable)
    • leader-namespace (ns for leader leases, default to "gatekeeper-system")
  • ctrl.Options (in main.go)
    • LeaderElection: maps to leader-election flag
    • LeaderElectionID: gk-leader-12345 (or whatever)
    • LeaderElectionNamespace: maps to leader-namespace flag
    • LeaderElectionResourceLock: "leases"

I'd love to submit a PR for this; sadly there's a TON of corporate red tape in the way. The above changes yield a fairly small patch that's easy to test and backward compat when leader-election is false.

jdef avatar Jun 10 '22 15:06 jdef

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 09 '22 19:08 stale[bot]

/reopen

On Tue, Aug 30, 2022, 7:29 PM stale[bot] @.***> wrote:

Closed #1794 https://github.com/open-policy-agent/gatekeeper/issues/1794 as completed.

— Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/gatekeeper/issues/1794#event-7291632631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLFXNUSOP4TEX5O6EALV32KLZANCNFSM5MDTNXRA . You are receiving this because you authored the thread.Message ID: @.*** com>

jdef avatar Aug 30 '22 23:08 jdef

max-bot reopening @sozercan what should users do if they want to re-open a bug?

maxsmythe avatar Aug 31 '22 07:08 maxsmythe

Not stale

On Tue, Aug 9, 2022, 3:23 PM stale[bot] @.***> wrote:

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

— Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/gatekeeper/issues/1794#issuecomment-1209784472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLDCA3FGRSFKDTYVHUDVYKV3JANCNFSM5MDTNXRA . You are receiving this because you authored the thread.Message ID: @.***>

jdef avatar Oct 11 '22 09:10 jdef

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 10 '22 10:12 stale[bot]

not stale

jdef avatar Dec 19 '22 12:12 jdef

@jdef Would you like to start a design doc and then bring the discussion to one of the community calls to drive this?

ritazh avatar Dec 19 '22 15:12 ritazh

this seems like a pretty trivial change, is a design doc really needed? otherwise i'm happy to join a call

On Mon, Dec 19, 2022 at 10:11 AM Rita Zhang @.***> wrote:

@jdef https://github.com/jdef Would you like to start a design doc and then bring the discussion to one of the community calls to drive this?

— Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/gatekeeper/issues/1794#issuecomment-1357815562, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLHUPZLMNDQK2WHFN3DWOB3L7ANCNFSM5MDTNXRA . You are receiving this because you were mentioned.Message ID: @.***>

-- James DeFelice

jdef avatar Dec 19 '22 15:12 jdef

Not sure if a full-scale design doc is needed, but would be good to know which controllers are in-scope for this (it sounds like singleton controllers only, such as status and audit), the other controllers should be running if we want hot standbys.

Also it'd be good to avoid the "webhooks cannot run next to audit with leader election enabled" edge case. Is it possible to have leader election be selectively enforced?

Of course, this does mean that a pod will self-restart if it loses the leader position, which, if serving a webhook endpoint, might have some interesting impacts to endpoint behavior.

You mentioned this is for pods not running via deployments and in other bugs you mentioned pods running off-cluster. I'm curious what the story for GC-ing stale by-pod status objects is?

Lastly, is there a reason for the leader election namespace to be different from G8r's namespace (by default this is determined via topdown and passed via env variable).

maxsmythe avatar Dec 19 '22 22:12 maxsmythe

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 18 '23 09:02 stale[bot]