gatekeeper-library icon indicating copy to clipboard operation
gatekeeper-library copied to clipboard

Check Pod policies on Deployments

Open markszabo opened this issue 2 years ago • 4 comments

We are using these policies on our kubernetes cluster, however we often run into the following situation:

  1. Create a PR with a change to an app (including the kubernetes yamls)
  2. Run the pipeline that runs kubectl apply --dry-run=server, which also evaluates the validating webhooks including gatekeeper. This passes
  3. PR is merged
  4. The deploy pipeline succeeds and updates the Deployment
  5. Gatekeeper prevents the creation of the new Pods as they go against some of the policies
  6. We have to send a new PR fixing the policies, which can only be tested after merge

Desired state: The pipeline should fail on a PR already if any of the policies are violated

Since the policies are only checked against Pods, but the PR contains Deployments, this situation often repeats itself. The same problem can happen with other workload resources that create pods.

Is there a solution or workaround for this problem?

One idea we have is to evaluate all policies defined for Pod's .spec against Deployment's spec.template.spec, but that would require changes to these policies, which would make it hard to keep them updated. Also I feel that this must be a common problem, so I'd prefer to find a solution that can be used by others easily too (e.g. if we end up making this change, we are happy to send a PR with them).

markszabo avatar Apr 07 '22 07:04 markszabo

I had some time and I think a simple change to the policies could make this possible. For example for allowedrepos:

package k8sallowedrepos

containers[c] {
  input.review.object.kind == "Pod"
  c := input.review.object.spec.containers[_]
}

containers[c] {
  input.review.object.kind == ["Deployment", "ReplicaSet", "StatefulSet", "DaemonSet", "Job", "ReplicationController"][_]
  c := input.review.object.spec.template.spec.containers[_]
}

containers[c] {
  input.review.object.kind == "CronJob"
  c := input.review.object.spec.jobTemplate.spec.template.spec.containers[_]
}

violation[{"msg": msg}] {
  container := containers[_]
  satisfied := [good | repo = input.parameters.repos[_] ; good = startswith(container.image, repo)]
  not any(satisfied)
  msg := sprintf("container <%v> has an invalid image repo <%v>, allowed repos are %v", [container.name, container.image, input.parameters.repos])
}

(and then of course the same for initContainers.)

I'm happy to prepare a PR with these, if you are willing to give this idea a shot.

markszabo avatar May 24 '22 06:05 markszabo

@markszabo Thanks for raising this! We are looking at a design to support workload resource validation. Would love your thoughts on it. In the meantime, can you see the gator test cli cmd can help?

ritazh avatar May 24 '22 13:05 ritazh

@ritazh thanks for sharing the document. I think it captures all of our use-cases (and also made me realize how complicated this problem is). Do you have a timeline on when this will be implemented?

markszabo avatar May 25 '22 01:05 markszabo

It's actively being worked on. I don't want to project a date, but here is a PR that is starting the work stream:

https://github.com/open-policy-agent/gatekeeper/pull/2062

Once that gets submitted, the next step would be to add the capability to the webhook.

maxsmythe avatar May 25 '22 02:05 maxsmythe

This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 31 '23 23:01 stale[bot]