libs icon indicating copy to clipboard operation
libs copied to clipboard

fd.name some times contain bogus <UNKNOWN> part in "Write below root" event

Open petterreinholdtsen opened this issue 1 year ago • 20 comments

Some times the events from sandbox rule "Write below root" contain a incomplete path, making it hard to know the location of the file in question. This is an example message:

"14:27:16.402853438: Error File below / or /root opened for writing (file=<UNKNOWN>cf_lock.lmdb.lock evt_type=openat user=root user_uid=0 user_loginuid=-1 process=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O_RDWR container_id=host container_name=host)",

The process triggering the event is cfengine with its lock file, and in other similar events the fd.name value is "//cf_lock.lmdb.lock", so I suspect the real value of the <UNKNOWN> part should be "/".

This is running on a RHEL 9.3 machine using the "modern" ebpf probe, with a setup where all rule sets from https://github.com/falcosecurity/rules/ are enabled. The machine uses cfengine to keep the setup correct and up to date.

I expected fd.name to contain a full path to the opened file, and suspect there is a bug somewhere causing the <UNKNOWN> part.

  • Falco version: 0.37.1
Tue Feb 27 14:35:54 2024: Falco version: 0.37.1 (x86_64)
Tue Feb 27 14:35:54 2024: Falco initialized with configuration file: /etc/falco/falco.yaml
Tue Feb 27 14:35:54 2024: System info: Linux version 5.14.0-284.11.1.el9_2.x86_64 ([email protected]) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) falcosecurity/falco#1 SMP PREEMPT_DYNAMIC Wed Apr 12 10:45:03 EDT 2023
{"default_driver_version":"7.0.0+driver","driver_api_version":"8.0.0","driver_schema_version":"2.0.0","engine_version":"31","engine_version_semver":"0.31.0","falco_version":"0.37.1","libs_version":"0.14.3","plugin_api_version":"3.2.0"}
  • System info:
Tue Feb 27 14:36:59 2024: Falco version: 0.37.1 (x86_64)
Tue Feb 27 14:36:59 2024: Falco initialized with configuration file: /etc/falco/falco.yaml
Tue Feb 27 14:36:59 2024: System info: Linux version 5.14.0-284.11.1.el9_2.x86_64 ([email protected]) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) falcosecurity/falco#1 SMP PREEMPT_DYNAMIC Wed Apr 12 10:45:03 EDT 2023
Tue Feb 27 14:36:59 2024: Loading rules from file /etc/falco/rules.d/01-falco-incubating_rules.yaml
Tue Feb 27 14:36:59 2024: Loading rules from file /etc/falco/rules.d/01-falco-sandbox_rules.yaml
Tue Feb 27 14:36:59 2024: Loading rules from file /etc/falco/rules.d/01-falco_rules.yaml
Tue Feb 27 14:36:59 2024: Loading rules from file /etc/falco/rules.d/60-nidhogg.yaml
Tue Feb 27 14:36:59 2024: /etc/falco/rules.d/60-nidhogg.yaml: Ok, with warnings
1 Warnings:
In rules content: (/etc/falco/rules.d/60-nidhogg.yaml:0:0)
    macro 'allowed_tmp_files': (/etc/falco/rules.d/60-nidhogg.yaml:41:2)
------
- macro: allowed_tmp_files
  ^
------
LOAD_UNUSED_MACRO (Unused macro): Macro not referred to by any other rule/macro

Tue Feb 27 14:36:59 2024: Loading rules from file /etc/falco/rules.d/99-settings.yaml
Tue Feb 27 14:36:59 2024: /etc/falco/rules.d/99-settings.yaml: Ok, with warnings
1 Warnings:
In rules content: (/etc/falco/rules.d/60-nidhogg.yaml:0:0)
    macro 'allowed_tmp_files': (/etc/falco/rules.d/60-nidhogg.yaml:41:2)
------
- macro: allowed_tmp_files
  ^
------
LOAD_UNUSED_MACRO (Unused macro): Macro not referred to by any other rule/macro

{
  "machine": "x86_64",
  "nodename": "soria.uio.no",
  "release": "5.14.0-284.11.1.el9_2.x86_64",
  "sysname": "Linux",
  "version": "#1 SMP PREEMPT_DYNAMIC Wed Apr 12 10:45:03 EDT 2023"
}
  • Cloud provider or hardware configuration:
  • OS: Red Hat Enterprise Linux 9.3
  • Kernel: Linux some-host 5.14.0-284.11.1.el9_2.x86_64 falcosecurity/falco#1 SMP PREEMPT_DYNAMIC Wed Apr 12 10:45:03 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Installation method: RPM

petterreinholdtsen avatar Feb 27 '24 13:02 petterreinholdtsen

Thanks @petterreinholdtsen:

Just to clarify

Some times the events from sandbox rule "Write below root" contain a incomplete path

is truly sometimes, not all the times?

Would you mind observing again with the following additional output fields? %fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd (https://falco.org/docs/reference/rules/supported-fields/) -> are all of these fields missing?

In addition, is it some new issues or have you observed this before in Falco? Plus could you check if you have a decent amount of dropped open* syscalls leveraging https://falco.org/docs/metrics/falco-metrics/, see also https://falco.org/docs/troubleshooting/dropping/.

incertum avatar Feb 28 '24 05:02 incertum

[Melissa Kilby]

is truly sometimes, not all the times?

Yes. Here is an example with // instead of <UNKNOWN>:

12:26:02.996402180: Error File below / or /root opened for writing (file=//cf_lock.lmdb.lock evt_type=openat user=root user_uid=0 user_loginuid=-1 process=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O_RDWR container_id=host container_name=host)

This is on the same machine.

Would you mind observing again with the following additional output fields? %fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd (https://falco.org/docs/reference/rules/supported-fields/) -> are all of these fields missing?

I will see what I can do. I do not have direct access to any of the problematic machines, only the logs from them, but will try to figure out a way to test a modified setup one one of the machines with this behavior.

In addition, is it some new issues or have you observed this before in Falco? Plus could you check if you have a decent amount of dropped open* syscalls leveraging https://falco.org/docs/metrics/falco-metrics/, see also https://falco.org/docs/troubleshooting/dropping/.

In total for the last event bulk I see 18 "Falco internal: syscall event drop" events across 5 machines and 8000 events. All of them happened on one of the machines with the <UNKNOWN> path, but the <UNKNOWN> path show up on more than one machine.

-- Happy hacking Petter Reinholdtsen

petterreinholdtsen avatar Feb 28 '24 11:02 petterreinholdtsen

@petterreinholdtsen would it be possible to instead use the new https://falco.org/docs/metrics/falco-metrics/ framework as it supports monotonic counters? That way we know more accurately how many dropped events there are since the beginning of launching Falco and more. Thanks!

Looking forward to hearing back from you in regards to the other fields %fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd.

incertum avatar Feb 29 '24 18:02 incertum

[Melissa Kilby]

@petterreinholdtsen would it be possible to instead use the new https://falco.org/docs/metrics/falco-metrics/ framework as it supports monotonic counters?

I am reluctant to add this to all our hosts, as we try to reduce the noise and number of irrelevant events in the log, and the metrics seem add regular quite a lot of "non-events' to the log.

Looking forward to hearing back from you in regards to the other fields %fd.nameraw %fs.path.name %fs.path.nameraw %proc.cwd.

This change rolled out this weekend, and here are a few example log events, one with <UNKNOWN> and one with a more path like file value.

Error File below / or /root opened for writing (file=<UNKNOWN>cf_lock.lmdb.lock nameraw=cf_lock.lmdb.lock pathname=/etc/sssd/cf_lock.lmdb.lock pathnameraw=cf_lock.lmdb.lock proccwd=/etc/sssd/ evt_type=openat user=root user_uid=0 user_loginuid=-1 process=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O_RDWR container_id=host container_name=host)

Error File below / or /root opened for writing (file=//cf_lock.lmdb.lock nameraw=cf_lock.lmdb.lock pathname=/var/cfengine/state/cf_lock.lmdb.lock pathnameraw=cf _lock.lmdb.lock proccwd=/var/cfengine/state/ evt_type=openat user=root user_uid=0 user_loginuid=-1 pr ocess=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O _RDWR container_id=host container_name=host)

-- Happy hacking Petter Reinholdtsen

petterreinholdtsen avatar Mar 04 '24 09:03 petterreinholdtsen

[Petter Reinholdtsen]

This change rolled out this weekend, and here are a few example log events, one with <UNKNOWN> and one with a more path like file value.

Perhaps the relevant difference is the file permission of the cwd directory?

drwx------. 1 root root 36 jan. 12 01:00 /etc/sssd/ drwxr-xr-x. 1 root root 318 jan. 18 12:37 /var/cfengine/

Even with the latter, '//' seem like the wrong path.

-- Happy hacking Petter Reinholdtsen

petterreinholdtsen avatar Mar 04 '24 12:03 petterreinholdtsen

Re https://falco.org/docs/metrics/falco-metrics/ you can adjust the timer aka how often you want to emit a log. Just FYI.

Error File below / or /root opened for writing (file=<UNKNOWN>cf_lock.lmdb.lock nameraw=cf_lock.lmdb.lock pathname=/etc/sssd/cf_lock.lmdb.lock pathnameraw=cf_lock.lmdb.lock proccwd=/etc/sssd/ evt_type=openat user=root user_uid=0 user_loginuid=-1 process=cf-agent proc_exepath=/usr/bin/cf-agent parent=cf-execd command=cf-agent terminal=0 exe_flags=O_RDWR container_id=host container_name=host)

Thank you for deploying the additional output fields 🙏 !

@LucaGuerra and @FedeDP given that pathname=/etc/sssd/cf_lock.lmdb.lock is correct (using the same new custom path concatenation method) something must be wonky wrt to how we store the cwd to then concat the path for fd.name since for the fs.path filters we fetch the args directly. Or something else in the fd. filtercheck logic could be wonky.

@petterreinholdtsen this needs a patch in Falco's libs, ETA unknown and we should consider releasing a patch release. We will keep you updated. @falcosecurity/falco-maintainers @leogr

incertum avatar Mar 04 '24 18:03 incertum

Given your examples above https://github.com/falcosecurity/libs/issues/1892 effectively the derivation of fd.name is more sophisticated and perhaps fs.path.name is actually wrong here. fs.path.name always concatenates the proc.cwd and the raw open* syscall arg wrt to opening a file, but it may be inaccurate. In addition, assuming cf_lock.lmdb.lock in both logs you shared are at the same location you see that the full path is different when looking at fs.path.name.

CC @FedeDP who shared a similar assessment. In summary, I am not sure if there is something to fix easily. At the same time if fs.path.name is wrong we should do something about that and try to make them equal?

@petterreinholdtsen did you observe a regression between release or is this just a new observation? Furthermore, could you check for example what the true location of cf_lock.lmdb.lock is on your system? Percentage wise how rare is the observation of <UNKNOWN> or // at the beginning?

Code in question:

  • https://github.com/falcosecurity/libs/blob/600976ebb32a34d53d537bb811bbc069d78b61a5/userspace/libsinsp/sinsp_filtercheck_event.cpp#L553-L573
  • https://github.com/falcosecurity/libs/blob/600976ebb32a34d53d537bb811bbc069d78b61a5/userspace/libsinsp/parsers.cpp#L2501
  • https://github.com/falcosecurity/libs/blob/600976ebb32a34d53d537bb811bbc069d78b61a5/userspace/libsinsp/sinsp_filtercheck_fspath.cpp#L354

More references:

  • https://man7.org/linux/man-pages/man3/dirfd.3.html

incertum avatar Mar 05 '24 03:03 incertum

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Jun 03 '24 03:06 poiana

[Melissa Kilby]

@petterreinholdtsen did you observe a regression between release or is this just a new observation? Furthermore, could you check for example what the true location of cf_lock.lmdb.lock is on your system? Percentage wise how rare is the observation of <UNKNOWN> or // at the beginning?

I just noticed in during testing. Do not know how often it happen.

-- Happy hacking Petter Reinholdtsen

petterreinholdtsen avatar Jun 03 '24 06:06 petterreinholdtsen

/remove-lifecycle stale

This issue should be fixed in libs. I'm transferring this issue to the libs repo.

leogr avatar Jun 04 '24 08:06 leogr

cc @FedeDP @Andreagit97

leogr avatar Jun 04 '24 08:06 leogr

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Sep 02 '24 10:09 poiana

I'm still waiting for a resolution.

-- Happy hacking Petter Reinholdtsen

petterreinholdtsen avatar Sep 02 '24 10:09 petterreinholdtsen

/remove-lifecycle stale

Sorry we had no time to look into this during this release cycle; setting next milestone: /milestone 0.19.0

FedeDP avatar Sep 02 '24 12:09 FedeDP

/milestone 0.20.0

FedeDP avatar Nov 13 '24 09:11 FedeDP

/milestone 0.21.0

FedeDP avatar Jan 08 '25 09:01 FedeDP

Moving to next milestone once again :( /milestone 0.22.0

FedeDP avatar Apr 02 '25 10:04 FedeDP

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Jul 01 '25 22:07 poiana

/remove-lifecycle stale

Is this still an issue? 🤔

leogr avatar Jul 02 '25 09:07 leogr

I'm still waiting for a resolution.

/remove-lifecycle stale

petterreinholdtsen avatar Jul 02 '25 09:07 petterreinholdtsen