oci-hook: allow users to set a list of namespace exceptions and definee default
Prior to this commit kube-system was always added as an exception in order for Pods in that namespace to be created in the event that the OCI hook was not able to reach the Tetragon agent. This leads to a deadlock scenario when Tetragon itself is not installed in kube-system.
Instead of always adding kube-syste this change will always add the namespace Tetragon is deployed in. In addition to that a user can now define additional namespaces as further exception. For example to ensure Pods in business-critical namespaces can still be created even if the OCI hook fails to reach the Tetragon agent.
Install Tetragon in a namespace other than kube-system:
helm install --namespace tetragon \
--set tetragonOperator.image.override=localhost/cilium/tetragon-operator:latest \
--set tetragon.image.override=localhost/cilium/tetragon:latest \
--set tetragon.grpc.address="unix:///var/run/cilium/tetragon/tetragon.sock" \
--set tetragon.ociHookSetup.enabled=true \
tetragon ./install/kubernetes/tetragon
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
tetragon-46lr7 2/2 Running 0 13s
tetragon-operator-7c8cc5964f-jzxdx 1/1 Running 0 13s
Fixes https://github.com/cilium/tetragon/issues/2402
Install helm with the new failAllowNamespaces values:
helm install --wait --namespace tetragon \
--set tetragonOperator.image.override=localhost/cilium/tetragon-operator:latest \
--set tetragon.image.override=localhost/cilium/tetragon:latest \
--set tetragon.grpc.address='unix:///var/run/cilium/tetragon/tetragon.sock' \
--set tetragon.ociHookSetup.enabled=true \
--set tetragon.ociHookSetup.failAllowNamespaces='business-critical-a\,business-critical-b' \
tetragon ./install/kubernetes/tetragon
Uninstall Tetragon to trigger OCI hook error:
helm uninstall --namespace tetragon tetragon
Create various namespaces:
kubectl create ns business-critical-a
kubectl create ns business-critical-b
kubectl create ns business-critical-c
Create deployments in namespaces:
kubectl create deploy -n business-critical-a nginx --image docker.io/nginx
kubectl create deploy -n business-critical-b nginx --image docker.io/nginx
kubectl create deploy -n business-critical-c nginx --image docker.io/nginx
As can be seen Pods in business-critical-a and business-critical-b were created and are Running whereas business-critical-c wasn't part of the list and therefore continues to fail:
$ kubectl get pods -A -l app=nginx
NAMESPACE NAME READY STATUS RESTARTS AGE
business-critical-a nginx-68fd89949f-8h4xg 1/1 Running 0 12s
business-critical-b nginx-68fd89949f-q7l2j 1/1 Running 0 12s
business-critical-c nginx-68fd89949f-kjflf 0/1 CreateContainerError 0 12s
Fixes https://github.com/cilium/tetragon/issues/2403
I think it makes sense to add @kkourt as a reviewer as he knows the most about the OCI hook implementation at this point.
Deploy Preview for tetragon ready!
| Name | Link |
|---|---|
| Latest commit | 2424f81efbc2ef85b5301390c08eb9afb87048db |
| Latest deploy log | https://app.netlify.com/sites/tetragon/deploys/664217a05d4e87000864db24 |
| Deploy Preview | https://deploy-preview-2404--tetragon.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Thanks for the review!
Added a small hint to clarify that Tetragon's namespace is always added (even if the user leaves failAllowNamespaces empty).
As per your last comment this is already accounted for, see this comment.
One final request: Can you squash the changes from my feedback into a single commit? (git rebase --interactive using the squash and fixup actions should help).
Done 👍