tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

NotDAddr/NotSAddr filters don't work

Open 3rgfbrgf opened this issue 3 months ago • 4 comments

What happened? How can we reproduce this?

Filters for tcp_connect/tcp_sendmsg using NotDAddr/NotSAddr operators are just ignored. This is a new behaviour after update to 1.5.0 from 0.8.0. It looks like that issue might be introduced in the fix for the issue. https://github.com/cilium/tetragon/issues/3712#event-17636040041 Now exclusions are not processed correctly.

Tetragon Version

1.5.0 But the same code still exists in 1.6.0

Kernel Version

6.8.0-85-generic

Kubernetes Version

1.31

Bugtool

No response

Relevant log output


Anything else?

No response

3rgfbrgf avatar Oct 30 '25 17:10 3rgfbrgf

Hi! We have some tests that use these operators, eg: https://github.com/cilium/tetragon/blob/main/pkg/sensors/tracing/kprobe_net_test.go#L566, and they seem to work fine. Can you share the policy you are using?

FedeDP avatar Oct 31 '25 14:10 FedeDP

Hi. Sure. Now it is as simple as that

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "monitor-network-activity-outside-local-nets"
  annotations:
    # argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  kprobes:
  - call: "tcp_connect"
    syscall: false
    args:
    - index: 0
      type: "sock"
    selectors:
    - matchArgs:
      - index: 0
        operator: "NotDAddr"
        values:
        - "10.0.0.0/8"
        - "172.16.0.0/12"
        - "192.168.0.0/16"
        - "127.0.0.1/8"

SAddr and DAddr filters works correctly, but any Not filter is just ignored.

I do not read the code freely, but I have a suspicion that the issue might be in the basic.h file after introducing "ipv4 in ipv6" format compatibility. I am not sure, because I am not a C developer, but some constructions look like you might doing the pure ipv4 parsing wrong after introducing of that compatibility.

The issue persists through several clusters in our environment, and basically I don't have another cluster to test it now.

3rgfbrgf avatar Oct 31 '25 14:10 3rgfbrgf

That's really interesting; i tried to update the aforementioned test case by addng more values to NotDAddr:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "tcp-connect"
spec:
  kprobes:
  - call: "tcp_connect"
    syscall: false
    args:
    - index: 0
      type: "sock"
    selectors:
    - matchArgs:
      - index: 0
        operator: "NotDAddr"
        values:
        - "10.0.0.0/8"
        - "172.16.0.0/12"
        - "192.168.0.0/16"
      - index: 0
        operator: "DPort"
        values:
        - "9934"
      - index: 0
        operator: "Protocol"
        values:
        - "IPPROTO_TCP"

and it works fine; then, if i add - "127.0.0.1/8", it starts to fail because the test itself does a tcp connect to:

addr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:9934")
suite.Require().NoError(err)
_, err = net.DialTCP("tcp", nil, addr)
suite.Require().NoError(err)

Therefore it seems like the filter itself is working just fine. I also tried to copy/paste your exact policy, and remove - "127.0.0.1/8" works fine, while instead adding it back makes the test fails as expected.

FedeDP avatar Oct 31 '25 15:10 FedeDP

Okay so I've updated to 1.6.0 and did some additional tests. It looks like from 40 clusters only 10 are affected. There is nothing common in them that differs them from the rest. Random mix of Ubuntu 24.04 minors (1,2,3) Mix of 6.8.0 kernel minors (from 51 to 85) containerd 1.7.1, 1.7.4 k8s v1.30.4, 1.31.4

What I have checked: I have chosen one specific container which regularly generates wrong log (log which should be excluded by filter). In this case it is dagster, which connects (with python) to another pod in the cluster (local cluster net is inside 172.16.0.0/12 subnet, and those comms should be excluded). I exec into this pod and connect with python to the same endpoint as in log manually, like python3.10 -c "import socket; s=socket.create_connection(('172.21.65.36',3030)); print('connected to',s.getpeername())"

Nothing is logged Just to be sure, I connect with the same commandline to the external endpoint which definitely should be logged, and it is logged. So it looks like filter and logging it this case work fine.

But tetragon still logs those connections from dagster itself. F.e. staging-dagster-webserver-844c6874f7-qq88f /usr/local/bin/python3.10 tcp 172.24.89.188:36948 -> 172.18.89.5:3030

3rgfbrgf avatar Nov 03 '25 14:11 3rgfbrgf