tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

matchArgs: NotDAddr filter unexpectedly drops valid events on tcp_sendmsg

Open erolg opened this issue 7 months ago • 5 comments

What happened?

When using matchArgs with operator: NotDAddr in a tcp_sendmsg kprobe, events that should match are being silently dropped — even though the destination IP clearly does not fall within the excluded ranges.

This behavior does not occur when the matchArgs selector is removed.

Policy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: connect
spec:
  kprobes:
    - call: tcp_sendmsg
      syscall: false
      args:
        - index: 0
          type: sock
      selectors:
        - matchArgs:
            - index: 0
              operator: NotDAddr
              values:
                - 10.232.0.0/14
                - 127.0.0.0/8
                - 169.0.0.0/8


Dropped example event:

📤 sendmsg project-x/app-8bd55445f-9l54x /usr/bin/java tcp 10.233.67.2:56604 -> 10.117.98.45:11210 bytes 0

### Tetragon Version

1.4.0

### Kernel Version

5.15.0-117-generic

### Kubernetes Version

v1.20.7

### Bugtool

_No response_

### Relevant log output

```shell

Anything else?

No response

erolg avatar May 07 '25 19:05 erolg

@erolg Hi Created almost same policy but with tcp_connect:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "tcp-connect"
spec:
  kprobes:
  - call: "tcp_connect"
    syscall: false
    args:
    - index: 0
      type: "sock"
    selectors:
      - matchArgs:
          - index: 0 
            operator: NotDAddr
            values:
              - 10.232.0.0/14

After executing nc 10.117.98.45 12345 getting correct event:

{
  "process_kprobe": {
    "process": {
      "exec_id": "bWluaWt1YmU6MzcyMDM1NTI4ODc5MjMyOjM0MTk2MzE=",
      "pid": 3419631,
      "uid": 1001,
      "binary": "/usr/bin/nc",
      "arguments": "10.117.98.45 12345",
      "flags": "execve clone",
      "start_time": "2025-05-07T22:18:11.369894392Z",
      "auid": 1001,
      "parent_exec_id": "bWluaWt1YmU6MzcxNTY0NjI0MTEzMTY4OjM0MTQzMDY=",
      "refcnt": 1,
      "cap": {},
      "tid": 3419631,
      "in_init_tree": false
    },
    "parent": {
      "exec_id": "bWluaWt1YmU6MzcxNTY0NjI0MTEzMTY4OjM0MTQzMDY=",
      "pid": 3414306,
      "uid": 1001,
      "binary": "/usr/bin/zsh",
      "flags": "execve clone",
      "start_time": "2025-05-07T22:10:20.465126987Z",
      "auid": 1001,
      "parent_exec_id": "bWluaWt1YmU6MzcxNDE5OTA5OTU4MjczOjM0MTI2MzQ=",
      "cap": {},
      "tid": 3414306,
      "in_init_tree": false
    },
    "function_name": "tcp_connect",
    "args": [
      {
        "sock_arg": {
          "family": "AF_INET",
          "type": "SOCK_STREAM",
          "protocol": "IPPROTO_TCP",
          "saddr": "192.168.0.111",
          "daddr": "10.117.98.45",
          "sport": 52396,
          "dport": 12345,
          "cookie": "18446638557553461568",
          "state": "TCP_SYN_SENT"
        }
      }
    ],
    "action": "KPROBE_ACTION_POST",
    "policy_name": "tcp-connect",
    "return_action": "KPROBE_ACTION_POST"
  },
  "node_name": "minikube",
  "time": "2025-05-07T22:18:11.373030579Z"
}

When setting filter to DAddr, I don't get the event. Seems correct.

kobrineli avatar May 07 '25 22:05 kobrineli

Yes, I can confirm that I also receive events with this policy. However, some of them are silently dropped, even though they clearly should match the NotDAddr condition.

This leads me to think the issue may not be with the selector logic itself, but rather with side effects of the selector when used in high-frequency kprobes like tcp_sendmsg.

Worth noting: when I remove the matchArgs block, I start receiving all tcp_sendmsg events again, including the exact flows that were previously missing.

erolg avatar May 08 '25 15:05 erolg

I'm trying to monitor dropped but valid TCP events, so I removed the selector section from the following TracingPolicy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  annotations:
    helm.sh/hook: post-install,post-upgrade
  name: connect
spec:
  kprobes:
    - call: tcp_sendmsg
      syscall: false
      return: false
      args:
        - index: 0
          type: sock

However, once I remove the matchArgs section, I start seeing Tetragon's own TCP events, even though I’ve configured both export-allowlist and export-denylist as follows:

export-allowlist: '{"event_set":["PROCESS_KPROBE"]}'
export-denylist: |-
  {"event_set": ["PROCESS_EXIT"]}
  {"health_check": true}
  {"namespace": ["", "cilium", "kube-system", "tetragon"]}

Yet I still see events like:

📤 sendmsg tetragon/tetragon-2xkhn /usr/bin/tetragon tcp 127.0.0.1:54321 -> 127.0.0.1:52400 bytes 0
🚀 process k8s-p-discovery-p2-25mars-worker-1 /usr/bin/runc ...
💥 exit    k8s-p-discovery-p2-25mars-worker-1 /usr/bin/runc ... 0

Shouldn't these events be suppressed by the export-denylist? Am I missing something, or is there an additional filter I should apply?

erolg avatar May 08 '25 16:05 erolg

Am I missing something, or is there an additional filter I should apply?

The export denylist only affects the export file. You seem to be using the tetra cli above, where the export deny list has no effect. You can pass similar filters to tetra getevets via appropriate flags.

kkourt avatar May 15 '25 06:05 kkourt

The export denylist only affects the export file.

It's been confusing so many people: https://github.com/cilium/tetragon/issues/3742

mtardy avatar May 15 '25 16:05 mtardy

Hey @erolg did you eventually found out the issue? Otherwise could you provide more steps to reproduce your issue so that we can investigate on our side? Thank you!

mtardy avatar Jul 15 '25 16:07 mtardy