tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

Creating a TracingPolicyNamespaced with the same name for a different namespace does not get applied.

Open joshuajorel opened this issue 1 year ago • 14 comments

What happened?

  1. In our K8s environment, we deployed the same policy to two different namespaces. However, only the first policy gets applied. This was confirmed by running the tetra tp list command on the tetra pods. We tested this behavior with the fd-install TracingPolicyNamespaced config in two different namespaces (default and test):
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "fd-install"
spec:
  kprobes:
  - call: "fd_install"
    syscall: false
    args:
    - index: 0
      type: "int"
    - index: 1
      type: "file"
    selectors:
    - matchArgs:
      - index: 1
        operator: "Equal"
        values:
        - "/tmp/tetragon"
      matchActions:
      - action: Sigkill

The following is the output of tetra tp list:

[5] fd-install enabled:true filterID:5 namespace:default sensors:gkp-sensor-5

Only one policy is applied and the policy for test is not even though the k8s resource exists.

Tetragon Version

CLI version: v1.0.1 Server version: v1.0.2 (installed via Helm)

Kernel Version

Linux ubuntu-noble 6.8.0-11-generic #11-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 14 00:29:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.29.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.28.3

Bugtool

time="2024-04-03T23:20:06Z" level=info msg="saving init info" time="2024-04-03T23:20:06Z" level=info msg="retrieving lib directory" libDir=/var/lib/tetragon/ time="2024-04-03T23:20:06Z" level=warning msg="not an object file, ignoring" path=/var/lib/tetragon/ time="2024-04-03T23:20:10Z" level=info msg="skipping metadata directory" path=/var/lib/tetragon/metadata time="2024-04-03T23:20:10Z" level=warning msg="no btf filename in tetragon config, attempting to fall back to /sys/kernel/btf/vmlinux" time="2024-04-03T23:20:11Z" level=info msg="btf file added" btfFname=/sys/kernel/btf/vmlinux time="2024-04-03T23:20:11Z" level=info msg="tetragon log file added" exportFname=/var/run/cilium/tetragon/tetragon.log time="2024-04-03T23:20:11Z" level=info msg="contacting metrics server" metricsAddr="http://localhost:2112/metrics" time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd=/bin/dmesg dstFname=dmesg.out ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev lo ingress" dstFname=tc-info.lo.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev lo egress" dstFname=tc-info.lo.egress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev eth0 ingress" dstFname=tc-info.eth0.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev eth0 egress" dstFname=tc-info.eth0.egress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethf85e1c33 ingress" dstFname=tc-info.vethf85e1c33.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethf85e1c33 egress" dstFname=tc-info.vethf85e1c33.egress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd2bc04e0 ingress" dstFname=tc-info.vethd2bc04e0.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd2bc04e0 egress" dstFname=tc-info.vethd2bc04e0.egress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd92333b8 ingress" dstFname=tc-info.vethd92333b8.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethd92333b8 egress" dstFname=tc-info.vethd92333b8.egress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethc9f1bfea ingress" dstFname=tc-info.vethc9f1bfea.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethc9f1bfea egress" dstFname=tc-info.vethc9f1bfea.egress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdc8843f6 ingress" dstFname=tc-info.vethdc8843f6.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdc8843f6 egress" dstFname=tc-info.vethdc8843f6.egress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdd056db0 ingress" dstFname=tc-info.vethdd056db0.ingress ret=0 time="2024-04-03T23:20:11Z" level=info msg="executed command" cmd="/sbin/tc filter show dev vethdd056db0 egress" dstFname=tc-info.vethdd056db0.egress ret=0 time="2024-04-03T23:20:12Z" level=info msg="executed command" cmd="/usr/bin/bpftool map show -j" dstFname=bpftool-maps.json ret=0 time="2024-04-03T23:20:12Z" level=info msg="executed command" cmd="/usr/bin/bpftool prog show -j" dstFname=bpftool-progs.json ret=0 time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/bpftool cgroup tree -j" dstFname=bpftool-cgroups.json ret=0 time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops stack localhost:8118" dstFname=gops.stack ret=0 time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops stats localhost:8118" dstFname=gpos.stats ret=0 time="2024-04-03T23:20:13Z" level=info msg="executed command" cmd="/usr/bin/gops memstats localhost:8118" dstFname=gops.memstats ret=0 time="2024-04-03T23:20:13Z" level=info msg="dumped tracing policies in tracing-policies.json"

Relevant log output

No response

Anything else?

No response

joshuajorel avatar Apr 05 '24 07:04 joshuajorel

Indeed, this is currently the case, i.e., the policy name should be unique across all other policies. I believe this also includes non-namespaced policies. This can be fixed, but it requires a significant amount of changes. My suggestion would be to use different policy names.

kkourt avatar Apr 05 '24 09:04 kkourt

Is that the intended behavior? I can understand for non-namespaced policies, that policy names need to be unique. However it doesn't seem intuitive from a k8s perspective to not allow this behavior.

joshuajorel avatar Apr 08 '24 03:04 joshuajorel

It's not the intended behavior, and I agree that it's counterintuitive.

Originally, Tetragon did not support namespaced policies so we used the policy name as a key, to uniquely identify a policy. When we introduced namespaced policies, this was not changed and we were left with the above limitation.

Internally, we maintain a mapping from a string (the policy name) to a collection: https://github.com/cilium/tetragon/blob/bd63a46bf4d5927587c5291f57433244aea201ac/pkg/sensors/handler.go#L17

Which is the internal state we keep for each policy: https://github.com/cilium/tetragon/blob/bd63a46bf4d5927587c5291f57433244aea201ac/pkg/sensors/collection.go#L39-L42

Changing the code so that we something like the following for the key:

type collection_key struct {
    name, namespace string
}

should allow us to have the same policy name in different namespaces.

kkourt avatar Apr 08 '24 06:04 kkourt

Would this be something the community would be interested in? I can contribute the change if it's not already being worked on.

joshuajorel avatar Apr 09 '24 04:04 joshuajorel

Would this be something the community would be interested in? I can contribute the change if it's not already being worked on.

We 've discussed this in the community call yesterday (https://docs.google.com/document/d/1BFMJLdtisiCSLfMct0GHof_ioL-5QVNLEaeMSlk_7Eo/edit) and the consensus was that this is something the community would defintely be interested in.

I'm not aware of anyone working on it, and we would gladly take this contribution. Happy to also guide along the way.

Thanks!

kkourt avatar Apr 09 '24 06:04 kkourt

@kkourt I created a draft PR here: https://github.com/cilium/tetragon/pull/2337

The namespace policy does get separated:

[kind-tetragon-dev|kube-system] (base) ➜  ~ kubectl exec  ds/tetragon -c tetragon -- tetra tp list

ID   NAME                       STATE     FILTERID   NAMESPACE   SENSORS
2    file-monitoring-filtered   enabled   2          test        gkp-sensor-2
3    file-monitoring-filtered   enabled   3          test2       gkp-sensor-3

However, the policy doesn't seem to capture the events. Any clue as to where I should look?

joshuajorel avatar Apr 16 '24 05:04 joshuajorel

@kkourt I created a draft PR here: #2337

The namespace policy does get separated:

[kind-tetragon-dev|kube-system] (base) ➜  ~ kubectl exec  ds/tetragon -c tetragon -- tetra tp list

ID   NAME                       STATE     FILTERID   NAMESPACE   SENSORS
2    file-monitoring-filtered   enabled   2          test        gkp-sensor-2
3    file-monitoring-filtered   enabled   3          test2       gkp-sensor-3

Cool, thanks!

However, the policy doesn't seem to capture the events. Any clue as to where I should look?

Does everything work as expected if the policies have different names?

kkourt avatar Apr 16 '24 06:04 kkourt

@kkourt the policies do not seem to be enforced. I also don't see process_exit events as you normally should. Any suggestions where to look next?

joshuajorel avatar Apr 16 '24 10:04 joshuajorel

@kkourt the policies do not seem to be enforced. I also don't see process_exit events as you normally should. Any suggestions where to look next?

So you mean that even if the policy names are not the same, the policies do not take effect? Can you please open a separate issue for this? Please include a sysdump, the policies themselves, and what is it the expected and actual results of the policies.

kkourt avatar Apr 16 '24 11:04 kkourt

@kkourt - Just did a sanity check, I rebuilt the codebase using the main branch without these code changes and indeed the sample policies are not taking in effect. Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

joshuajorel avatar Apr 16 '24 12:04 joshuajorel

@kkourt - Just did a sanity check, I rebuilt the codebase using the main branch without these code changes and indeed the sample policies are not taking in effect. Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

I'm not sure about WSL2, but I wouldn't be surprised if there was an issue with it. Can you please create another issue with it? Should be possible to figure out what's wrong with a sysdump.

kkourt avatar Apr 16 '24 12:04 kkourt

@kkourt created a separate issue here

joshuajorel avatar Apr 16 '24 12:04 joshuajorel

@kkourt created a separate issue https://github.com/cilium/tetragon/issues/2338

thanks!

Are there any known issues using WSL2? Otherwise, I will have to test in a different environment to confirm this behavior.

It seesm that WSL2 is not working properly. Would need to investigate further to figure out how to address the issue. In the meantime, would it be possible to use another environment (e.g., a normal linux VM) for testing? Thanks!

kkourt avatar Apr 17 '24 09:04 kkourt

@kkourt will just be reinstalling my tools in a VM and continue testing there.

joshuajorel avatar Apr 18 '24 03:04 joshuajorel