Regression: host metrics are not available running only source plugins
Describe the bug
After this change, https://github.com/falcosecurity/libs/pull/1362 host metrics will no longer be available when running a source plugin like k8saduit. This is a regression and we need to find a way to address it.
Metrics extracted with dummy plugin in Falco 0.36.0-rc3
{
"output_fields": {
"evt.source": "dummy",
"evt.time": 1695381599561603364,
"falco.container_memory_used": 0,
"falco.cpu_usage_perc": 0.0,
"falco.duration_sec": 1695381599,
"falco.host_boot_ts": 0,
"falco.host_num_cpus": 0,
"falco.hostname": "",
"falco.kernel_release": "",
"falco.memory_pss": 25,
"falco.memory_rss": 29,
"falco.memory_vsz": 1779,
"falco.num_evts": 18641280,
"falco.num_evts_prev": 0,
"falco.outputs_queue_num_drops": 0,
"falco.start_ts": 0,
"falco.version": "0.36.0-rc3",
"scap.engine_name": "source_plugin"
},
"sample": 1
}
{
"output_fields": {
"evt.source": "dummy",
"evt.time": 1695381604561553388,
"falco.container_memory_used": 0,
"falco.cpu_usage_perc": 0.1,
"falco.duration_sec": 1695381604,
"falco.evts_rate_sec": 3573155.7144059963,
"falco.host_boot_ts": 0,
"falco.host_num_cpus": 0,
"falco.hostname": "",
"falco.kernel_release": "",
"falco.memory_pss": 25,
"falco.memory_rss": 29,
"falco.memory_vsz": 1908,
"falco.num_evts": 36506880,
"falco.num_evts_prev": 18641280,
"falco.outputs_queue_num_drops": 0,
"falco.start_ts": 0,
"falco.version": "0.36.0-rc3",
"scap.engine_name": "source_plugin"
},
"sample": 2
}
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Pitching one approach to get out of this regression here: https://github.com/falcosecurity/libs/pull/1362#issuecomment-2238352359
Oh hey @gnosek
Special notes for your reviewer: This is definitely not the first time around the block with this, so I'm waiting for the inevitable "revert this" PR ;)
... the day has come ;)
Instead of reverting, would the following be acceptable to get out of the regression for the host metrics falcosecurity/falco#2821 when you run Falco w/ plugin source only, but still on Linux.
- Add a new
sinsp_mode_tcalledSINSP_MODE_PLUGIN_HOST_METRICS(or similar) that complementsSINSP_MODE_PLUGINwhen running Falco w/ a plugin source only (no syscalls), but also with the Falco metrics framework enabled- When running in mode
SINSP_MODE_PLUGIN_HOST_METRICS, allocate a new platform that could be calledscap_linux_host_metrics_alloc_platformor similar. It would basically be a Linux light platform that would only alloc what is needed in order to havem_machine_info,m_agent_infoand theiflist(for a new metrics field in the making) available to the metrics framework.- Other ideas?
Hey @incertum, long time no see :)
I like the idea of a separate linux-lite platform that would skip the process list, but still collect non-process metadata. Having said that, I can't express how much I dislike adding a new mode (and the concept of sinsp modes in general) :)
My suggestion would be:
- add the linux-lite platform (as a scap_platform implementation; would just share most of the code with the full linux one)
- introduce a special-purpose enum for the platform selection (generic/lite/full linux)
- replace the sinsp_mode_t param to sinsp::open_plugin with the enum (we'd still pick a mode for sinsp; would probably be SINSP_MODE_PLUGIN for the lite platform)
I am going to sidestep the question of which should be the default by making the parameter mandatory, and therefore the caller's problem :D
I went through the libs code and the specific mode doesn't seem to matter in this case, so let's try and slowly move towards having no modes (they're effectively a class hierarchy smeared all over the code).
Sadly, this will break the sinsp API, but it feels better than adding a new mode value that's only ever used in one place (and would have to be handled in is_plugin or is_live)
[side rant: scap does not care about machine_info etc. one bit, except for capture files so this could all happily live in sinsp; this would need a giant-but-ultimately-trivial PR to untangle that I have made several times already but never ended up submitting it because $REASONS]
https://github.com/falcosecurity/libs/pull/1969
@gnosek yes SGTM https://github.com/falcosecurity/falco/issues/2821#issuecomment-2242124223
Also thanks for opening the PR, I was gonna suggest that you would be most suited for this clean up ;)
re
scap does not care about machine_info etc. one bit, except for capture files so this could all happily live in sinsp;
We are still always going back and forth with what should be in sinsp vs scap. Btw @mrgian is moving more of the metrics stuff to scap in order to make it multi-OS, see the PR https://github.com/falcosecurity/libs/pull/1870.
Sigh :)
That looks like it could use moving libsinsp/metrics_collector.cpp to something like libsinsp/linux/metrics_collector.cpp and some build system changes to avoid #ifdef __linux__ in platform-agnostic code (either provide a stub for other platforms, or a #define HAS_METRICS_COLLECTOR or some such)
I'm not opposed to platform-specific code in libsinsp (honestly, there's little else in there), I'm just opposed to spreading it around the codebase with ifdefs or hidden assumptions. I'd love to see e.g. platform-agnostic interfaces with platform-specific implementations.
Ceterum censeo libscap esse delendam :)
Ceterum censeo libscap esse delendam :)
:) looking forward to it!
Also thanks for posting on Gianmatteo's PR with few suggestions!