falco
falco copied to clipboard
Falco Pod Restarts
Describe the bug
Once every few days, we get a random pod restart. The restart is always about a minute after the pod is starting, and the status observed is completed, with no logs about the reason:
The resources are fine, this is unlike the OOM errors that sometimes occur.
How to reproduce it
Expected behaviour No restarts
Screenshots
* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.36.1, driver version=6.0.1+driver, arch=aarch64, kernel release=5.10.179-168.710.amzn2.aarch64, kernel version=1
* Running falco-driver-loader with: driver=bpf, compile=yes, download=yes
* Mounting debugfs
* Filename 'falco_amazonlinux2_5.10.179-168.710.amzn2.aarch64_1.o' is composed of:
- driver name: falco
- target identifier: amazonlinux2
- kernel release: 5.10.179-168.710.amzn2.aarch64
- kernel version: 1
* Skipping download, eBPF probe is already present in /root/.falco/.....
* Skipping compilation, eBPF probe is already present in /root/.falco/....
* eBPF probe located in /root/.falco/6.0.1+driver/....
* Success: eBPF probe symlinked to /root/.falco/falco-bpf.o
Mon Nov 13 13:17:31 2023: Falco version: 0.36.1 (aarch64)
Mon Nov 13 13:17:31 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Mon Nov 13 13:17:31 2023: Loading rules from file /etc/falco/falco_rules.yaml
Environment
- Falco version:
falco version=0.36.1, driver version=6.0.1+driver arch=aarch64 kernel release=5.10.179-168.710.amzn2.aarch64, kernel version=1
- Installation method: helm
Additional context
Hi @omfurman-ma, could you share the logs of the restarted pod using the kubectl logs falco-pod-xyz --previous
?
@alacuku
kubectl -n falco logs falco-xpzws --previous
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falcoctl-artifact-install (init)
* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.36.1, driver version=6.0.1+driver, arch=x86_64, kernel release=5.10.179-168.710.amzn2.x86_64, kernel version=1
* Running falco-driver-loader with: driver=bpf, compile=yes, download=yes
* Mounting debugfs
* Filename 'falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o' is composed of:
- driver name: falco
- target identifier: amazonlinux2
- kernel release: 5.10.179-168.710.amzn2.x86_64
- kernel version: 1
* Trying to download a prebuilt eBPF probe from https://download.falco.org/driver/6.0.1%2Bdriver/x86_64/falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o
* Skipping compilation, eBPF probe is already present in /root/.falco/6.0.1+driver/x86_64/falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o
* eBPF probe located in /root/.falco/6.0.1+driver/x86_64/falco_amazonlinux2_5.10.179-168.710.amzn2.x86_64_1.o
* Success: eBPF probe symlinked to /root/.falco/falco-bpf.o
Sat Nov 18 14:42:55 2023: Falco version: 0.36.1 (x86_64)
Sat Nov 18 14:42:55 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Sat Nov 18 14:42:55 2023: Loading rules from file /etc/falco/falco_rules.yaml
Sat Nov 18 14:42:55 2023: Loading rules from file /etc/falco/rules.d/custom-ma-secops.yaml
Sat Nov 18 14:42:55 2023: Loading rules from file /etc/falco/rules.d/rules-exceptions.yaml
Sat Nov 18 14:42:55 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Sat Nov 18 14:42:55 2023: Starting health webserver with threadiness 32, listening on port 8765
Sat Nov 18 14:42:55 2023: Loaded event sources: syscall
Sat Nov 18 14:42:55 2023: Enabled event sources: syscall
Sat Nov 18 14:42:55 2023: Opening 'syscall' source with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
Sat Nov 18 14:44:32 2023: SIGINT received, exiting...
Syscall event drop monitoring:
- event drop detected: 0 occurrences
- num times actions taken: 0
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:
It seems the system is killing falco. Are you sure that it's not OOM killed?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Provide feedback via https://github.com/falcosecurity/community. /close
@poiana: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with
/reopen
.Mark the issue as fresh with
/remove-lifecycle rotten
.Provide feedback via https://github.com/falcosecurity/community. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.