[QUESTION] what is the best way to update a TracingPolicy without losing the enforcement
Hi all! With this issue, I would like to understand the best way to update a TracingPolicy without losing protection.
Let's say I have a policy like this deployed in my cluster
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "policy-1"
spec:
podSelector:
matchLabels:
app: "my-deployment-1"
kprobes:
- call: "security_bprm_creds_for_exec"
syscall: false
args:
- index: 0
type: "linux_binprm"
selectors:
- matchArgs:
- index: 0
operator: "NotEqual"
values:
- "/usr/bin/sleep"
- "/usr/bin/cat"
- "/usr/bin/my-server-1"
matchActions:
- action: Override
argError: -1
options:
- name: disable-kprobe-multi
value: "1"
At a certain point, I need to add a new value to my list (e.g., /usr/bin/ls).
If I update the TracingPolicy CR, what should happen is that the collection associated with the policy is deleted (so all the ebpf progs are detached), and then a new collection with new values is deployed https://github.com/cilium/tetragon/blob/9dea41615eae8a59b0433fa29fa01cf590dda41f/pkg/watcher/crdwatcher/tracingpolicy.go#L97-L110
So, if I understand the code well, there is a short interval in which the policy is no longer enforced in the system.
An alternative could be to create a new policy with a different name, with the updated list, and only when the new one is created, we delete the old one.
Does Tetragon provide an out-of-the-box solution for this kind of situation? Is using 2 policies the best solution to solve this issue?
Hello!
Does Tetragon provide an out-of-the-box solution for this kind of situation? Is using 2 policies the best solution to solve this issue?
Today, I think using two policies is the best approach.
If we want to do better, I think there are two approaches:
-
Keep a copy of the tracing policy that the current eBPF path is based on. When a policy update event happens, we can try and diff the two, and if they have only changed in terms of map contents, we can just update the bpf maps instead of remove and update the policy.
-
Doing the "diff", however, is tricky. Another approach would be to have the values in a different K8s resource and update the values as this other resource changes. This is similar to the template idea you proposed in #4191.
Hi @kkourt , thanks for the detailed explanation! I have a follow-up question about the two policies approach.
Currently we use TracingPolicy and TracingPolicyNamespaced CR to control Tetragon policies. One problem that we've seen about the two policies method is that there is no information about whether a TracingPolicy CR has taken effect or not.
Say I have a policy A, and I would like to add /usr/bin/bash to its arg selector. For that, I create a policy B based on policy A and append the /usr/bin/bash to the list. However, unlike using gRPC API, when I do this in CR, I don't know if policy B is installed correctly and takes effect, so it's difficult for us to know when to delete the original policy A.
I can see that if Tetragon reported the status of tracing policies in the CR, the CR user would be able to do something with it, but I wonder if this is the recommended method to handle this?
I can see that if Tetragon reported the status of tracing policies in the CR, the CR user would be able to do something with it, but I wonder if this is the recommended method to handle this?
We don't have a good way of dealing with this at the moment. You would have to check the policy status on every agent before uninstalling policy A. One thing that might help is that the policy is exported in two ways: over gRPC and Prometheus metrics.
I think we can close this one. Thank you for the suggestions