gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

No automated way to wait on constraint template CRD upgrades before updating constraints

Open Boojapho opened this issue 4 years ago • 3 comments

Use case:

I am using a tool like flux or argo to manage deployments into a Kubernetes cluster. The Gatekeeper system is in one Helm deployment and the Gatekeeper constraints are in another Helm deployment with the latter dependent on the former.

On the first deployment, I can use helm with --wait to insure that the Gatekeeper system is up and running and all CRDs are deployed before the helm for the constraints runs.

Let's say I add a field to the CRD for a constraint template. On my next helm upgrade, the CRD must be in place before the constraint that uses that field gets deployed or it will fail due to schema validation. Since Helm sees the Gatekeeper deployment as already up and running with no changes and the ConstraintTemplate resource is deployed, it completes quickly. But, Gatekeeper has not processed the new ConstraintTemplate. The Constraint is then deployed next, but fails because the CRD doesn't have the new field.

There needs to be a way to identify if CRDs are up-to-date with the ConstraintTemplate so the Constraint Helm knows if it should proceed or wait. Some ideas:

  • On upgrade, force the deployment to go not ready until gatekeeper has a chance to re-check all Constraint Templates. This would allow a --wait to wait on it to process everything before continuing. Maybe labels/annotations could do this.
  • Create an API query to gatekeeper to identify if it is in sync or out of sync. Then use an init container to check this in the Container helm before continuing

Boojapho avatar Aug 14 '21 16:08 Boojapho

For reference, I worked around this by adding the chart version to the template labels for the controller pod. Every time I bump the chart version due to a new constraint template, it re-rolls the controller pods (in a rolling update) which will complete after all of the new templates are read in. It is not ideal since the templates may not have changed, but it only takes a few seconds extra. The --wait option will then hold up any future helm charts (e.g. constraints) that are dependent on the changes.

Another option I tried that worked was to create a checksum annotation of all the constraint templates on the controller pods in the deployment. But, this would require upkeep when constraint templates are added, removed, or renamed. I opted to keep it simple.

Boojapho avatar Aug 14 '21 18:08 Boojapho

I'm wondering if status.byPod would be a good fit for this?

status.byPod[].observedGeneration should equal metadata.generation for all pods once all pods have ingested the constraint template, which includes the updating of the constraint CRD.

In practice, any pod showing the correct observedGeneration should be sufficient for the code as currently written, but blocking on all pods is safer.

Here is an example constraint template:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"templates.gatekeeper.sh/v1beta1","kind":"ConstraintTemplate","metadata":{"annotations":{},"name":"k8srequiredlabels"},"spec":{"crd":{"spec":{"names":{"kind":"K8sRequiredLabels"},"validation":{"openAPIV3Schema":{"properties":{"labels":{"items":{"properties":{"allowedRegex":{"type":"string"},"key":{"type":"string"}},"type":"object"},"type":"array"},"message":{"type":"string"}}}}}},"targets":[{"rego":"package k8srequiredlabels\n\nget_message(parameters, _default) = msg {\n  not parameters.message\n  msg := _default\n}\n\nget_message(parameters, _default) = msg {\n  msg := parameters.message\n}\n\nviolation[{\"msg\": msg, \"details\": {\"missing_labels\": missing}}] {\n  provided := {label | input.review.object.metadata.labels[label]}\n  required := {label | label := input.parameters.labels[_].key}\n  missing := required - provided\n  count(missing) \u003e 0\n  def_msg := sprintf(\"you must provide labels: %v\", [missing])\n  msg := get_message(input.parameters, def_msg)\n}\n\nviolation[{\"msg\": msg}] {\n  value := input.review.object.metadata.labels[key]\n  expected := input.parameters.labels[_]\n  expected.key == key\n  # do not match if allowedRegex is not defined, or is an empty string\n  expected.allowedRegex != \"\"\n  not re_match(expected.allowedRegex, value)\n  def_msg := sprintf(\"Label \u003c%v: %v\u003e does not satisfy allowed regex: %v\", [key, value, expected.allowedRegex])\n  msg := get_message(input.parameters, def_msg)\n}\n","target":"admission.k8s.gatekeeper.sh"}]}}
  creationTimestamp: "2021-08-16T23:01:44Z"
  generation: 1
  name: k8srequiredlabels
  resourceVersion: "1010332"
  uid: d8e7bf94-da3a-4e3a-8b67-cbd6886dd2c2
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        legacySchema: true
        openAPIV3Schema:
          properties:
            labels:
              items:
                properties:
                  allowedRegex:
                    type: string
                  key:
                    type: string
                type: object
              type: array
            message:
              type: string
  targets:
  - rego: |
      package k8srequiredlabels

      get_message(parameters, _default) = msg {
        not parameters.message
        msg := _default
      }

      get_message(parameters, _default) = msg {
        msg := parameters.message
      }

      violation[{"msg": msg, "details": {"missing_labels": missing}}] {
        provided := {label | input.review.object.metadata.labels[label]}
        required := {label | label := input.parameters.labels[_].key}
        missing := required - provided
        count(missing) > 0
        def_msg := sprintf("you must provide labels: %v", [missing])
        msg := get_message(input.parameters, def_msg)
      }

      violation[{"msg": msg}] {
        value := input.review.object.metadata.labels[key]
        expected := input.parameters.labels[_]
        expected.key == key
        # do not match if allowedRegex is not defined, or is an empty string
        expected.allowedRegex != ""
        not re_match(expected.allowedRegex, value)
        def_msg := sprintf("Label <%v: %v> does not satisfy allowed regex: %v", [key, value, expected.allowedRegex])
        msg := get_message(input.parameters, def_msg)
      }
    target: admission.k8s.gatekeeper.sh
status:
  byPod:
  - id: gatekeeper-audit-6445fb87b7-7nd7l
    observedGeneration: 1
    operations:
    - audit
    - mutation-status
    - status
    templateUID: d8e7bf94-da3a-4e3a-8b67-cbd6886dd2c2
  - id: gatekeeper-controller-manager-854d7945bb-r9kds
    observedGeneration: 1
    operations:
    - webhook
    templateUID: d8e7bf94-da3a-4e3a-8b67-cbd6886dd2c2
  - id: gatekeeper-controller-manager-854d7945bb-whs4c
    observedGeneration: 1
    operations:
    - webhook
    templateUID: d8e7bf94-da3a-4e3a-8b67-cbd6886dd2c2
  - id: gatekeeper-controller-manager-854d7945bb-zszwl
    observedGeneration: 1
    operations:
    - webhook
    templateUID: d8e7bf94-da3a-4e3a-8b67-cbd6886dd2c2
  created: true

maxsmythe avatar Aug 16 '21 23:08 maxsmythe

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 23 '22 06:07 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 11 '22 05:10 stale[bot]

Still something we should simplify.

maxsmythe avatar Oct 27 '22 01:10 maxsmythe

I am seeing this issue when doing an initial deploy of gatekeeper. The install is via flattened helm charts (yaml files). Gatekeeper software deploys and all the constraint templates deploy, ALL the constraints fail to deploy due not yet finding the named template for each constraint. The constraints deploy correctly later as they can now access the templates by name.

I can see this issue also happening on an upgrade with any template change and associated constraint change as originally posted. There needs to be a way to ensure all the template changes are deployed and current as a prerequisite for deploying or updating constraints.

jvossler-cogility avatar Nov 11 '22 16:11 jvossler-cogility

Thanks for reporting this. Unfortunately this isn't really a problem to be addressed on the Gatekeeper side, but more at the deployment orchestration stage. Going to close this for now, but feel free to add additional comments if needed.

salaxander avatar Feb 22 '24 17:02 salaxander