falco Undetected container/k8s information leads to alert fatigue caused by k8s

Hello Falco team,

During evaluating Falco on AKS k8s clusters, my team and I observed a continuous alert generation triggered by rule Contact K8S API Server From Container.

The huge amount of generated alerts appear to be false positive, due it was caused by wrongly evaluating of the macro k8s_containers that was caused by missing value for the attribute container.image.repository.

Describe the bug

Once least privileged falco was deployed to Azure AKS cluster, the rule Contact K8S API Server From Container started to generated alerts each second. The generated alerts do not provide any container/k8s information, so no tracing could be performed.

The only valuable information is the k8s pod's internal IP address that initiated the connection, but that was not enough and did not help falco engine with anything to not trigger the alert.

How to reproduce it

Deploy least privileged Falco in AKS cluster
Check the logs generated by falco pod/container
Notice alerts generated by falco rule Contact K8S API Server From Container. Those alerts are generated each second.
Notice the alert does not provide any container/k8s information. Even container.id attribute contains as value an empty string "container.id": ""

Expected behaviour

All container/k8s information, inclusive container.id have to be properly detected and as result macro k8s_containers should successfully evaluated and none false positive alerts be generated.

Evidences

The alerts similar to one listed below are generated in falco pod each second and as result could lead to alert fatigue.

{
    "hostname": "aks-lhxynh188x-14626312-vmss000014",
    "output": "09:55:34.482587748: Notice Unexpected connection to K8s API Server from container (connection=10.244.11.4:39472->192.168.0.1:443 lport=443 rport=39472 fd_type=ipv4 fd_proto=fd.l4proto evt_type=connect user=<NA> user_uid=4294967295 user_loginuid=-1 process=<NA> proc_exepath= parent=<NA> command=<NA> terminal=0 container_id= container_image=<NA> container_image_tag=<NA> container_name=<NA> k8s_ns=<NA> k8s_pod_name=<NA>)",
    "priority": "Notice",
    "rule": "Contact K8S API Server From Container",
    "source": "syscall",
    "tags": [
        "T1565",
        "container",
        "k8s",
        "maturity_stable",
        "mitre_discovery",
        "network"
    ],
    "time": "2024-06-20T09:55:34.482587748Z",
    "output_fields": {
        "container.id": "",
        "container.image.repository": null,
        "container.image.tag": null,
        "container.name": null,
        "evt.time": 1718877334482587748,
        "evt.type": "connect",
        "fd.lport": 443,
        "fd.name": "10.244.11.4:39472->192.168.0.1:443",
        "fd.rport": 39472,
        "fd.type": "ipv4",
        "k8s.ns.name": null,
        "k8s.pod.name": null,
        "proc.cmdline": "<NA>",
        "proc.exepath": "",
        "proc.name": "<NA>",
        "proc.pname": null,
        "proc.tty": 0,
        "user.loginuid": -1,
        "user.name": "<NA>",
        "user.uid": 4294967295
    }
}

Environment

Falco deployed on AKS cluster using Falco Helm Chart version 4.4.2.

Falco version: Falco version: 0.38.0 (x86_64)
System info:

{
  "machine": "x86_64",
  "nodename": "falco-8b7db",
  "release": "5.15.0-1064-azure",
  "sysname": "Linux",
  "version": "#73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024"
}

Cloud provider or hardware configuration: Azure
OS:

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Kernel:

Linux falco-8b7db 5.15.0-1064-azure #73-Ubuntu SMP Tue Apr 30 14:24:24 UTC 2024 x86_64 GNU/Linux

Installation method: Deploy to k8s cluster as DaemonSet by using Helm Chart version 4.4.2 using the custom values YAML file

driver:
  enabled: true
  kind: modern_ebpf
  modernEbpf:
    leastPrivileged: true

tty: true

falco:
  json_output: true
  json_include_output_property: true

image:
  pullPolicy: Always
  repository: falcosecurity/falco-distroless

extra:
  args:
  - --disable-cri-async

Jun 20 '24 15:06 aberezovski

Thanks @aberezovski. The fact that container.id is empty is strange, since we fetch the cgroup info in the kernel and it should always be there. Also a bunch of other data is missing.

For other rules and other events the container.id is not an empty string?

[The container rules macro only checks that it is not equal to "host", we perhaps may want to also add a check for empty strings. But that's not solving the root problem]

Jun 20 '24 16:06 incertum

Hi @incertum,

Regarding your question above For other rules and other events the container.id is not an empty string?, I observed that weird behaviour only for alerts generated by rule Contact K8S API Server From Container and ONLY when falco is running as non-privileged container.

For other rules and events I observed a random behaviour of showing only container.id value and missing all other attributes' values regarding container, pod and k8s namespace. Check another issue I opened for that #3256

Jun 21 '24 10:06 aberezovski

I observed that weird behavior only for alerts generated by rule Contact K8S API Server From Container and ONLY when falco is running as non-privileged container.

Thanks for opening the new issue, best to discuss it there. I tagged some other maintainers.

For other rules and events I observed a random behavior of showing only container.id value and missing all other attributes' values regarding container, pod and k8s namespace.

Known imperfection. There are other open issues. Given we do API calls there can be lookup time delays, plus we still need to improve the container engine in general. It is on my plate for the Falco 0.39.0 dev cycle.

Jun 21 '24 17:06 incertum

I am also seeing the same message (with different IPs, ports and timestamps but with the same user_uid) when I use --set driver.modernEbpf.leastPrivileged=true. I also tried to set the capabilities manually to something like --set-json containerSecurityContext='{"capabilities":{"add":["CHOWN","DAC_OVERRIDE","DAC_READ_SEARCH","FOWNER","FSETID","KILL","SETGID","SETUID","SETPCAP","LINUX_IMMUTABLE","NET_BIND_SERVICE","NET_BROADCAST","NET_ADMIN","NET_RAW","IPC_LOCK","IPC_OWNER","SYS_MODULE","SYS_RAWIO","SYS_CHROOT","SYS_PTRACE","SYS_PACCT","SYS_ADMIN","SYS_BOOT","SYS_NICE","SYS_RESOURCE","SYS_TIME","SYS_TTY_CONFIG","MKNOD","LEASE","AUDIT_WRITE","AUDIT_CONTROL","SETFCAP","MAC_OVERRIDE","MAC_ADMIN","SYSLOG","WAKE_ALARM","BLOCK_SUSPEND","AUDIT_READ","PERFMON","BPF","CHECKPOINT_RESTORE"]}}' just to check if there is something missing there, but it did not fix the error. If I use --set driver.modernEbpf.leastPrivileged=false I do not see this error.

EDIT: I forgot to mention that my belief is that these log lines are generated for falco itself, i.e. when falco tries to get the metadata from the K8s API server to enrich the system calls.

Jun 28 '24 18:06 vassap2022

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Sep 26 '24 22:09 poiana

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Oct 26 '24 22:10 poiana

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

Nov 25 '24 22:11 poiana

@poiana: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Nov 25 '24 22:11 poiana

falco
falco copied to clipboard

Undetected container/k8s information leads to alert fatigue caused by k8s_containers macro

falco falco copied to clipboard

Undetected container/k8s information leads to alert fatigue caused by k8s_containers macro

falco
falco copied to clipboard