gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

templates deployed with chart not ready before constraints are delivered

Open Morriz opened this issue 3 years ago • 5 comments

What steps did you take and what happened:

we currently split up deployment of gatekeeper in 3 charts, one after another:

  1. operator
  2. artifacts (includes config.gatekeeper.sh/v1alpha1 and constraint templates)
  3. constraints

NOTE: We don't disable validation on install, so any webhook is not blocked from doing its work.

The constraints deployment fails with the following error repeated for each missing template:

Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "BannedImageTags" in version "constraints.gatekeeper.sh/v1beta1"
...

We think this happens because chart 2 reports ready without having processed the templates fully into available CRDs.

If that assumption is correct, then shouldn't there be a validating webhook that waits to report ready when it receives such a template? Why would it report ready too soon?

What did you expect to happen:

The templates to be processed and available as CRDs.

Anything else you would like to add:

Environment:

  • Gatekeeper version: 3.4.0
  • Kubernetes version: (use kubectl version): 1.19.8

Morriz avatar Jun 16 '21 11:06 Morriz

We also see that after stage 2 the validating webhook is not accepting requests to create constraints:

Error: Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": dial tcp 192.168.83.239:8443: connect: connection refused

So my assumption that it was handling it is not correct, and this issue might be a duplicate as we have seen this error before.

Morriz avatar Jun 16 '21 12:06 Morriz

One (hopefully) helpful observation:

The operator starts really quickly but then goes dormant. In the operator logs we see a timeout of around 2-3 minutes before something activates (the webhook service?). In the meantime we can not deploy constraints, but after that 'activation' we can. Why does it take so long to start accepting new constraints?

Morriz avatar Jun 16 '21 12:06 Morriz

We are having similar issues with an automated deployment. We are using a helm chart to deploy constrainttemplates (under terraform with the helm provider), and then another helm chart to deploy constraints. We are having to use elaborate post-install hooks to sleep for many minutes to be able to deploy the constraints (and even then, it isn't 100% reliable).

jpriebe avatar Jul 27 '21 19:07 jpriebe

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 23 '22 06:07 stale[bot]

@jpriebe we introduced a wait for the existence of the resources in redkubes/otomi-core. Check the charts/gatekeeper-artifacts folder.

Morriz avatar Aug 01 '22 14:08 Morriz

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 11 '22 02:10 stale[bot]