falco
falco copied to clipboard
falco 0.31.1 occasionally unpacking fd.sip incorrectly on kernel 4.15.0?
Describe the bug
I don't know if anyone else has reported similar, but a pattern we've been seeing after we upgraded to Falco 0.31.1 from 0.30.0 (we use the BPF probe) was that Falco started occasionally triggering events from container processes with unusual fd.sip ip addresses:
62.127.0.0 115.127.0.0 185.127.0.0 238.127.0.0 252.127.0.0
It seemed like it was wrongly unpacking/decoding the syscall info with some bitmask / pointer address being interpreted wrongly
How to reproduce it
Not clear at this time.
Expected behaviour
Falco would retrieve the syscall information accurately.
Screenshots
N/A
Environment
- Falco version: 0.31.1
- System info:
{"machine":"x86_64","nodename":"XXXXXXXX","release":"4.15.0-176-generic","sysname":"Linux","version":"#185-Ubuntu SMP Tue Mar 29 17:40:04 UTC 2022"}
- Cloud provider or hardware configuration:
- OS: Ubuntu 18.04.6 LTS
- Kernel: 4.15.0-176-generic
- Installation method: Kubernetes
Additional context
Rolling back to 0.30.0 with the same ruleset resolves the issue
I understand there isn't much information to go on here, but I thought I should raise an issue to have something recorded and I'm happy to capture whatever additional debug might be useful here
/cc @FedeDP /cc @Andreagit97 /cc @mstemm
Thank you @dnwe will try to understand what is going on here... If you find further information about the issue please add them here :)
Hi @dnwe ! Can you share the rules that trigger this issue? Thank you!
@FedeDP there's nothing special in the rules really, they're simply copies of the samples for allowlisting outgoing connections to a range of port and IP combinations, with a tweaked output format and a set of permitted CIDR and port combinations for each host type:
https://github.com/falcosecurity/falco/blob/9392c0295a62d2e3f833f29d5d543cb99bb8b3a1/rules/falco_rules.yaml#L387-L396
These have worked fine since first introduced into our env with falco 0.17.0 and the kernel probe approx 3 years ago, and have continued in use through various upgrades (0.23.x / 0.26.x etc.) up until 0.30.0 without any noticeable issue. Approx 7 months ago t 0.30.0 we also migrated to using the BPF module and similarly saw no issue or regressions at that point in time.
However, after rolling out the 0.31.1 upgrade in our staging envs approx 2 weeks ago, we started seeing our alerting fire for these various unusual fd.sip addresses:
May 20 16:41:17 falco-29qj6 disallowed outbound connection destination was detected: podname=kafka-proxy-b8f7bc696-64zgq namespace=data src=172.x.x.x sport=31126 dest=254.127.0.0 dport=9093
May 24 07:16:17 falco-nqx26 disallowed outbound connection destination was detected: podname=prometheus-1 namespace=prometheus src=172.x.x.x sport=36246 dest=185.127.0.0 dport=64429
May 25 08:36:47 falco-4rmz2 disallowed outbound connection destination was detected: podname=prometheus-2 namespace=prometheus src=172.x.x.x sport=32541 dest=115.127.0.0 dport=700
The output field format from the event are:
podname=%k8s.pod.name namespace=%k8s.ns.name src=%fd.cip sport=%fd.cport dest=%fd.sip dport=%fd.sport
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
/remove-lifecycle rotten
/help
@leogr: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Hey @dnwe
Is this still an issue? :thinking:
@leogr I need to re-test on 0.33.1, we have been pinned on 0.30.0 in the interim
@leogr I need to re-test on 0.33.1, we have been pinned on 0.30.0 in the interim
Thank you!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Possibly related to this patch: https://github.com/falcosecurity/libs/pull/1059
It should be shipped with Falco 0.35; i will ping you to eventually test it if you still care :) Thank you! /remove-lifecycle stale
@FedeDP awesome! thank you
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Have been running falco 0.35 since 8th June without seeing any recurrence of this issue. Closing as fixed, thanks all