falco icon indicating copy to clipboard operation
falco copied to clipboard

falco 0.31.1 occasionally unpacking fd.sip incorrectly on kernel 4.15.0?

Open dnwe opened this issue 3 years ago • 6 comments

Describe the bug

I don't know if anyone else has reported similar, but a pattern we've been seeing after we upgraded to Falco 0.31.1 from 0.30.0 (we use the BPF probe) was that Falco started occasionally triggering events from container processes with unusual fd.sip ip addresses:

62.127.0.0 115.127.0.0 185.127.0.0 238.127.0.0 252.127.0.0

It seemed like it was wrongly unpacking/decoding the syscall info with some bitmask / pointer address being interpreted wrongly

How to reproduce it

Not clear at this time.

Expected behaviour

Falco would retrieve the syscall information accurately.

Screenshots

N/A

Environment

  • Falco version: 0.31.1
  • System info: {"machine":"x86_64","nodename":"XXXXXXXX","release":"4.15.0-176-generic","sysname":"Linux","version":"#185-Ubuntu SMP Tue Mar 29 17:40:04 UTC 2022"}
  • Cloud provider or hardware configuration:
  • OS: Ubuntu 18.04.6 LTS
  • Kernel: 4.15.0-176-generic
  • Installation method: Kubernetes

Additional context

Rolling back to 0.30.0 with the same ruleset resolves the issue

dnwe avatar May 25 '22 09:05 dnwe

I understand there isn't much information to go on here, but I thought I should raise an issue to have something recorded and I'm happy to capture whatever additional debug might be useful here

dnwe avatar May 25 '22 09:05 dnwe

/cc @FedeDP /cc @Andreagit97 /cc @mstemm

leogr avatar May 25 '22 09:05 leogr

Thank you @dnwe will try to understand what is going on here... If you find further information about the issue please add them here :)

Andreagit97 avatar May 25 '22 10:05 Andreagit97

Hi @dnwe ! Can you share the rules that trigger this issue? Thank you!

FedeDP avatar May 30 '22 12:05 FedeDP

@FedeDP there's nothing special in the rules really, they're simply copies of the samples for allowlisting outgoing connections to a range of port and IP combinations, with a tweaked output format and a set of permitted CIDR and port combinations for each host type:

https://github.com/falcosecurity/falco/blob/9392c0295a62d2e3f833f29d5d543cb99bb8b3a1/rules/falco_rules.yaml#L387-L396

These have worked fine since first introduced into our env with falco 0.17.0 and the kernel probe approx 3 years ago, and have continued in use through various upgrades (0.23.x / 0.26.x etc.) up until 0.30.0 without any noticeable issue. Approx 7 months ago t 0.30.0 we also migrated to using the BPF module and similarly saw no issue or regressions at that point in time.

However, after rolling out the 0.31.1 upgrade in our staging envs approx 2 weeks ago, we started seeing our alerting fire for these various unusual fd.sip addresses:

May 20 16:41:17 falco-29qj6 disallowed outbound connection destination was detected: podname=kafka-proxy-b8f7bc696-64zgq namespace=data src=172.x.x.x sport=31126 dest=254.127.0.0 dport=9093 
May 24 07:16:17 falco-nqx26 disallowed outbound connection destination was detected: podname=prometheus-1 namespace=prometheus src=172.x.x.x sport=36246 dest=185.127.0.0 dport=64429
May 25 08:36:47 falco-4rmz2 disallowed outbound connection destination was detected: podname=prometheus-2 namespace=prometheus src=172.x.x.x sport=32541 dest=115.127.0.0 dport=700

The output field format from the event are:

podname=%k8s.pod.name namespace=%k8s.ns.name src=%fd.cip sport=%fd.cport dest=%fd.sip dport=%fd.sport

dnwe avatar May 30 '22 13:05 dnwe

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Aug 28 '22 15:08 poiana

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

poiana avatar Sep 27 '22 15:09 poiana

/remove-lifecycle rotten

Andreagit97 avatar Sep 29 '22 13:09 Andreagit97

/help

leogr avatar Oct 17 '22 16:10 leogr

@leogr: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

poiana avatar Oct 17 '22 16:10 poiana

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Jan 15 '23 21:01 poiana

/remove-lifecycle stale

FedeDP avatar Jan 15 '23 22:01 FedeDP

Hey @dnwe

Is this still an issue? :thinking:

leogr avatar Jan 18 '23 10:01 leogr

@leogr I need to re-test on 0.33.1, we have been pinned on 0.30.0 in the interim

dnwe avatar Jan 18 '23 10:01 dnwe

@leogr I need to re-test on 0.33.1, we have been pinned on 0.30.0 in the interim

Thank you!

leogr avatar Jan 18 '23 10:01 leogr

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Apr 18 '23 13:04 poiana

Possibly related to this patch: https://github.com/falcosecurity/libs/pull/1059

It should be shipped with Falco 0.35; i will ping you to eventually test it if you still care :) Thank you! /remove-lifecycle stale

FedeDP avatar Apr 27 '23 10:04 FedeDP

@FedeDP awesome! thank you

dnwe avatar Apr 27 '23 10:04 dnwe

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Jul 26 '23 13:07 poiana

Have been running falco 0.35 since 8th June without seeing any recurrence of this issue. Closing as fixed, thanks all

dnwe avatar Jul 26 '23 15:07 dnwe