falco icon indicating copy to clipboard operation
falco copied to clipboard

Regression: host metrics are not available running only source plugins

Open Andreagit97 opened this issue 2 years ago • 10 comments

Describe the bug

After this change, https://github.com/falcosecurity/libs/pull/1362 host metrics will no longer be available when running a source plugin like k8saduit. This is a regression and we need to find a way to address it.

Metrics extracted with dummy plugin in Falco 0.36.0-rc3

{
	"output_fields": {
		"evt.source": "dummy",
		"evt.time": 1695381599561603364,
		"falco.container_memory_used": 0,
		"falco.cpu_usage_perc": 0.0,
		"falco.duration_sec": 1695381599,
		"falco.host_boot_ts": 0,
		"falco.host_num_cpus": 0,
		"falco.hostname": "",
		"falco.kernel_release": "",
		"falco.memory_pss": 25,
		"falco.memory_rss": 29,
		"falco.memory_vsz": 1779,
		"falco.num_evts": 18641280,
		"falco.num_evts_prev": 0,
		"falco.outputs_queue_num_drops": 0,
		"falco.start_ts": 0,
		"falco.version": "0.36.0-rc3",
		"scap.engine_name": "source_plugin"
	},
	"sample": 1
}
{
	"output_fields": {
		"evt.source": "dummy",
		"evt.time": 1695381604561553388,
		"falco.container_memory_used": 0,
		"falco.cpu_usage_perc": 0.1,
		"falco.duration_sec": 1695381604,
		"falco.evts_rate_sec": 3573155.7144059963,
		"falco.host_boot_ts": 0,
		"falco.host_num_cpus": 0,
		"falco.hostname": "",
		"falco.kernel_release": "",
		"falco.memory_pss": 25,
		"falco.memory_rss": 29,
		"falco.memory_vsz": 1908,
		"falco.num_evts": 36506880,
		"falco.num_evts_prev": 18641280,
		"falco.outputs_queue_num_drops": 0,
		"falco.start_ts": 0,
		"falco.version": "0.36.0-rc3",
		"scap.engine_name": "source_plugin"
	},
	"sample": 2
}

Andreagit97 avatar Sep 21 '23 09:09 Andreagit97

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Dec 21 '23 15:12 poiana

/remove-lifecycle stale

Andreagit97 avatar Dec 22 '23 08:12 Andreagit97

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Apr 29 '24 21:04 poiana

/remove-lifecycle stale

Andreagit97 avatar Apr 30 '24 07:04 Andreagit97

Pitching one approach to get out of this regression here: https://github.com/falcosecurity/libs/pull/1362#issuecomment-2238352359

incertum avatar Jul 19 '24 06:07 incertum

Oh hey @gnosek

Special notes for your reviewer: This is definitely not the first time around the block with this, so I'm waiting for the inevitable "revert this" PR ;)

... the day has come ;)

Instead of reverting, would the following be acceptable to get out of the regression for the host metrics falcosecurity/falco#2821 when you run Falco w/ plugin source only, but still on Linux.

  • Add a new sinsp_mode_t called SINSP_MODE_PLUGIN_HOST_METRICS (or similar) that complements SINSP_MODE_PLUGIN when running Falco w/ a plugin source only (no syscalls), but also with the Falco metrics framework enabled
  • When running in mode SINSP_MODE_PLUGIN_HOST_METRICS, allocate a new platform that could be called scap_linux_host_metrics_alloc_platform or similar. It would basically be a Linux light platform that would only alloc what is needed in order to have m_machine_info, m_agent_info and the iflist (for a new metrics field in the making) available to the metrics framework.
  • Other ideas?

Hey @incertum, long time no see :)

I like the idea of a separate linux-lite platform that would skip the process list, but still collect non-process metadata. Having said that, I can't express how much I dislike adding a new mode (and the concept of sinsp modes in general) :)

My suggestion would be:

  • add the linux-lite platform (as a scap_platform implementation; would just share most of the code with the full linux one)
  • introduce a special-purpose enum for the platform selection (generic/lite/full linux)
  • replace the sinsp_mode_t param to sinsp::open_plugin with the enum (we'd still pick a mode for sinsp; would probably be SINSP_MODE_PLUGIN for the lite platform)

I am going to sidestep the question of which should be the default by making the parameter mandatory, and therefore the caller's problem :D

I went through the libs code and the specific mode doesn't seem to matter in this case, so let's try and slowly move towards having no modes (they're effectively a class hierarchy smeared all over the code).

Sadly, this will break the sinsp API, but it feels better than adding a new mode value that's only ever used in one place (and would have to be handled in is_plugin or is_live)

[side rant: scap does not care about machine_info etc. one bit, except for capture files so this could all happily live in sinsp; this would need a giant-but-ultimately-trivial PR to untangle that I have made several times already but never ended up submitting it because $REASONS]

gnosek avatar Jul 22 '24 05:07 gnosek

https://github.com/falcosecurity/libs/pull/1969

gnosek avatar Jul 22 '24 06:07 gnosek

@gnosek yes SGTM https://github.com/falcosecurity/falco/issues/2821#issuecomment-2242124223

Also thanks for opening the PR, I was gonna suggest that you would be most suited for this clean up ;)

re

scap does not care about machine_info etc. one bit, except for capture files so this could all happily live in sinsp;

We are still always going back and forth with what should be in sinsp vs scap. Btw @mrgian is moving more of the metrics stuff to scap in order to make it multi-OS, see the PR https://github.com/falcosecurity/libs/pull/1870.

incertum avatar Jul 22 '24 16:07 incertum

Sigh :)

That looks like it could use moving libsinsp/metrics_collector.cpp to something like libsinsp/linux/metrics_collector.cpp and some build system changes to avoid #ifdef __linux__ in platform-agnostic code (either provide a stub for other platforms, or a #define HAS_METRICS_COLLECTOR or some such)

I'm not opposed to platform-specific code in libsinsp (honestly, there's little else in there), I'm just opposed to spreading it around the codebase with ifdefs or hidden assumptions. I'd love to see e.g. platform-agnostic interfaces with platform-specific implementations.

Ceterum censeo libscap esse delendam :)

gnosek avatar Jul 23 '24 08:07 gnosek

Ceterum censeo libscap esse delendam :)

:) looking forward to it!

Also thanks for posting on Gianmatteo's PR with few suggestions!

incertum avatar Jul 23 '24 22:07 incertum