falco Falco Pod Restarts

Describe the bug Once every few days, we get a random pod restart. The restart is always about a minute after the pod is starting, and the status observed is completed, with no logs about the reason: The resources are fine, this is unlike the OOM errors that sometimes occur.

How to reproduce it

Expected behaviour No restarts

Screenshots

* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.36.1, driver version=6.0.1+driver, arch=aarch64, kernel release=5.10.179-168.710.amzn2.aarch64, kernel version=1
* Running falco-driver-loader with: driver=bpf, compile=yes, download=yes
* Mounting debugfs
* Filename 'falco_amazonlinux2_5.10.179-168.710.amzn2.aarch64_1.o' is composed of:
 - driver name: falco
 - target identifier: amazonlinux2
 - kernel release: 5.10.179-168.710.amzn2.aarch64
 - kernel version: 1
* Skipping download, eBPF probe is already present in /root/.falco/.....
* Skipping compilation, eBPF probe is already present in /root/.falco/....
* eBPF probe located in /root/.falco/6.0.1+driver/....
* Success: eBPF probe symlinked to /root/.falco/falco-bpf.o
Mon Nov 13 13:17:31 2023: Falco version: 0.36.1 (aarch64)
Mon Nov 13 13:17:31 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Mon Nov 13 13:17:31 2023: Loading rules from file /etc/falco/falco_rules.yaml

Environment

Falco version:

falco version=0.36.1, driver version=6.0.1+driver arch=aarch64 kernel release=5.10.179-168.710.amzn2.aarch64, kernel version=1

Installation method: helm

Additional context

Nov 13 '23 13:11 omfurman-ma

Hi @omfurman-ma, could you share the logs of the restarted pod using the kubectl logs falco-pod-xyz --previous?

Nov 13 '23 14:11 alacuku

@alacuku

 kubectl -n falco logs falco-xpzws --previous
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falcoctl-artifact-install (init)
* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.36.1, driver version=6.0.1+driver, arch=x86_64, kernel release=5.10.179-168.710.amzn2.x86_64, kernel version=1
* Running falco-driver-loader with: driver=bpf, compile=yes, download=yes
* Mounting debugfs
* Filename 'falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o' is composed of:
 - driver name: falco
 - target identifier: amazonlinux2
 - kernel release: 5.10.179-168.710.amzn2.x86_64
 - kernel version: 1
* Trying to download a prebuilt eBPF probe from https://download.falco.org/driver/6.0.1%2Bdriver/x86_64/falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o
* Skipping compilation, eBPF probe is already present in /root/.falco/6.0.1+driver/x86_64/falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o
* eBPF probe located in /root/.falco/6.0.1+driver/x86_64/falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o
* Success: eBPF probe symlinked to /root/.falco/falco-bpf.o
Sat Nov 18 14:42:55 2023: Falco version: 0.36.1 (x86_64)
Sat Nov 18 14:42:55 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Sat Nov 18 14:42:55 2023: Loading rules from file /etc/falco/falco_rules.yaml
Sat Nov 18 14:42:55 2023: Loading rules from file /etc/falco/rules.d/custom-ma-secops.yaml
Sat Nov 18 14:42:55 2023: Loading rules from file /etc/falco/rules.d/rules-exceptions.yaml
Sat Nov 18 14:42:55 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Sat Nov 18 14:42:55 2023: Starting health webserver with threadiness 32, listening on port 8765
Sat Nov 18 14:42:55 2023: Loaded event sources: syscall
Sat Nov 18 14:42:55 2023: Enabled event sources: syscall
Sat Nov 18 14:42:55 2023: Opening 'syscall' source with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
Sat Nov 18 14:44:32 2023: SIGINT received, exiting...
Syscall event drop monitoring:
   - event drop detected: 0 occurrences
   - num times actions taken: 0
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

Nov 19 '23 07:11 omfurman-ma

It seems the system is killing falco. Are you sure that it's not OOM killed?

Nov 21 '23 08:11 alacuku

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Feb 19 '24 09:02 poiana

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Mar 20 '24 09:03 poiana

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

Apr 19 '24 09:04 poiana

@poiana: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 19 '24 09:04 poiana

falco falco copied to clipboard

Falco Pod Restarts

falco
falco copied to clipboard