gatekeeper
gatekeeper copied to clipboard
ReplicaSet explosion caused by conflicting mutations
What steps did you take and what happened: [A clear and concise description of what the bug is.]
Using the OPA gatekeeper to mutate a replicaset owned by a deployment may result in significant cluster stability problems due to replicaset explosion cause by conflicting mutations. See the issue description for recreate instructions.
What did you expect to happen:
We recommend that the OPA documentation and/or code warn against mutation of replicasets owned by a deployment.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] https://github.com/kubernetes/kubernetes/issues/57167 https://docs.google.com/document/d/10LFy30JTfTD3qgCsBZ2S8ZpuWao9mqT_xqkcbvPzVf4/
Environment: IBM Cloud Kubernetes Service
- Gatekeeper version: 3.12
- Kubernetes version: (use
kubectl version): 1.28
Recreate Scenario
Install Open Policy Agent (OPA) Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.12.0/deploy/gatekeeper.yaml
kubectl rollout status deployment -n gatekeeper-system gatekeeper-audit
kubectl rollout status deployment -n gatekeeper-system gatekeeper-controller-manager
Create Test Deployments
for i in $(seq 10); do
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: restricted-$i
name: restricted-$i
spec:
selector:
matchLabels:
run: restricted-$i
template:
metadata:
labels:
run: restricted-$i
spec:
containers:
- name: restricted-$i
image: us.icr.io/armada-master/pause:3.9
securityContext:
privileged: false
runAsUser: 1000
runAsGroup: 1000
EOF
done
for i in $(seq 10); do
kubectl rollout status deployment restricted-$i
done
Verify Test Deployments BEFORE OPA Gatekeeper Mutation
Restart Deployments
kubectl get rs --no-headers | wc -l
for i in $(seq 10); do
kubectl rollout restart deployment restricted-$i
done
for i in $(seq 10); do
kubectl rollout status deployment restricted-$i
done
kubectl get rs --no-headers | wc -l
Scale Deployments
kubectl get rs --no-headers | wc -l
for i in $(seq 10); do
kubectl scale deployment restricted-$i --replicas=2
done
for i in $(seq 10); do
kubectl rollout status deployment restricted-$i
done
kubectl get rs --no-headers | wc -l
Create OPA Gatekeeper Mutator
kubectl apply -f - <<EOF
apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
name: mutator
spec:
applyTo:
- groups:
- apps
kinds:
- ReplicaSet
versions:
- v1
- groups:
- apps
kinds:
- Deployment
versions:
- v1
location: spec.template.spec.containers[name:*].securityContext.allowPrivilegeEscalation
match:
kinds:
- apiGroups:
- apps
kinds:
- ReplicaSet
- apiGroups:
- apps
kinds:
- Deployment
scope: Namespaced
parameters:
assign:
value: false
EOF
Verify Test Deployments AFTER OPA Gatekeeper Mutation
Restart a Deployment - No Problems
kubectl get rs --no-headers | wc -l
kubectl rollout restart deployment restricted-1
kubectl rollout status deployment restricted-1
sleep 10
kubectl get rs --no-headers | wc -l
Delete ReplicaSet for a Deployment - Some Problems
kubectl get rs --no-headers | wc -l
kubectl delete replicaset -l run=restricted-2
sleep 10
kubectl get rs --no-headers | wc -l
Delete Pods for a Deployment - No Problems
kubectl get rs --no-headers | wc -l
kubectl delete pod -l run=restricted-3
sleep 10
kubectl get rs --no-headers | wc -l
Scale a Deployment - Big Problems
# Get some popcorn, find a comfortable chair, and watch the fireworks.
kubectl get rs --no-headers | wc -l
kubectl scale deployment restricted-10 --replicas=3
kubectl get rs --no-headers | wc -l
for i in $(seq 20); do
kubectl get rs --no-headers | wc -l
sleep 6
done
Fix Test Deployments
kubectl get rs --no-headers | wc -l
for i in $(seq 10); do
kubectl rollout restart deployment restricted-$i
done
for i in $(seq 10); do
kubectl rollout status deployment restricted-$i
done
kubectl get rs --no-headers | wc -l
for i in $(seq 20); do
kubectl get rs --no-headers | wc -l
sleep 6
done
Restart Cluster
kubectl delete -f - <<EOF
apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
name: mutator
EOF
for i in $(seq 10); do
kubectl delete deployment restricted-$i
done
I can think of a workaround: Do not match both RS and deployment. Check RS.meta.ownerReference to be null for unmanaged RS and skip other RS.
Thanks for raising this @rtheis! Is there a reason you cannot match and apply to Pod and change the location?
e.g.:
apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
name: mutator
spec:
applyTo:
- groups: [""]
kinds: ["Pod"]
versions: ["v1"]
match:
scope: Namespaced
kinds:
- apiGroups: ["*"]
kinds: ["Pod"]
location: "spec.containers[name:*].securityContext.allowPrivilegeEscalation"
...
Another example similar to this: https://open-policy-agent.github.io/gatekeeper/website/docs/mutation#adding-dnspolicy-and-dnsconfig-to-a-pod
@jiahuif @ritazh thank you. We certainly can and did work with folks to modify the match to ignore relicasets owned by a deployment.
You could use expansion templates if you want to target both Deployments and Pods (and other templates implementing pod template). Not required, but it's just an option in case you weren't aware of the feature.
https://open-policy-agent.github.io/gatekeeper/website/docs/expansion
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Ping
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Ping