NotDAddr/NotSAddr filters don't work
What happened? How can we reproduce this?
Filters for tcp_connect/tcp_sendmsg using NotDAddr/NotSAddr operators are just ignored. This is a new behaviour after update to 1.5.0 from 0.8.0. It looks like that issue might be introduced in the fix for the issue. https://github.com/cilium/tetragon/issues/3712#event-17636040041 Now exclusions are not processed correctly.
Tetragon Version
1.5.0 But the same code still exists in 1.6.0
Kernel Version
6.8.0-85-generic
Kubernetes Version
1.31
Bugtool
No response
Relevant log output
Anything else?
No response
Hi! We have some tests that use these operators, eg: https://github.com/cilium/tetragon/blob/main/pkg/sensors/tracing/kprobe_net_test.go#L566, and they seem to work fine. Can you share the policy you are using?
Hi. Sure. Now it is as simple as that
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "monitor-network-activity-outside-local-nets"
annotations:
# argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
kprobes:
- call: "tcp_connect"
syscall: false
args:
- index: 0
type: "sock"
selectors:
- matchArgs:
- index: 0
operator: "NotDAddr"
values:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
- "127.0.0.1/8"
SAddr and DAddr filters works correctly, but any Not filter is just ignored.
I do not read the code freely, but I have a suspicion that the issue might be in the basic.h file after introducing "ipv4 in ipv6" format compatibility. I am not sure, because I am not a C developer, but some constructions look like you might doing the pure ipv4 parsing wrong after introducing of that compatibility.
The issue persists through several clusters in our environment, and basically I don't have another cluster to test it now.
That's really interesting; i tried to update the aforementioned test case by addng more values to NotDAddr:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "tcp-connect"
spec:
kprobes:
- call: "tcp_connect"
syscall: false
args:
- index: 0
type: "sock"
selectors:
- matchArgs:
- index: 0
operator: "NotDAddr"
values:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
- index: 0
operator: "DPort"
values:
- "9934"
- index: 0
operator: "Protocol"
values:
- "IPPROTO_TCP"
and it works fine; then, if i add - "127.0.0.1/8", it starts to fail because the test itself does a tcp connect to:
addr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:9934")
suite.Require().NoError(err)
_, err = net.DialTCP("tcp", nil, addr)
suite.Require().NoError(err)
Therefore it seems like the filter itself is working just fine.
I also tried to copy/paste your exact policy, and remove - "127.0.0.1/8" works fine, while instead adding it back makes the test fails as expected.
Okay so I've updated to 1.6.0 and did some additional tests. It looks like from 40 clusters only 10 are affected. There is nothing common in them that differs them from the rest. Random mix of Ubuntu 24.04 minors (1,2,3) Mix of 6.8.0 kernel minors (from 51 to 85) containerd 1.7.1, 1.7.4 k8s v1.30.4, 1.31.4
What I have checked:
I have chosen one specific container which regularly generates wrong log (log which should be excluded by filter).
In this case it is dagster, which connects (with python) to another pod in the cluster (local cluster net is inside 172.16.0.0/12 subnet, and those comms should be excluded).
I exec into this pod and connect with python to the same endpoint as in log manually, like
python3.10 -c "import socket; s=socket.create_connection(('172.21.65.36',3030)); print('connected to',s.getpeername())"
Nothing is logged Just to be sure, I connect with the same commandline to the external endpoint which definitely should be logged, and it is logged. So it looks like filter and logging it this case work fine.
But tetragon still logs those connections from dagster itself.
F.e.
staging-dagster-webserver-844c6874f7-qq88f /usr/local/bin/python3.10 tcp 172.24.89.188:36948 -> 172.18.89.5:3030