falco icon indicating copy to clipboard operation
falco copied to clipboard

[DISCUSSION] deprecate / remove old internal drop stats?

Open incertum opened this issue 9 months ago • 5 comments

@Andreagit97 opening a dedicated issue to discuss https://github.com/falcosecurity/libs/pull/1433#issuecomment-1807694658

I saw that in Falco (and I imagine also in other consumers) we use the get_capture_stats method to obtain the number of drops/events. On the other hand, with stats_v2 we are using an agnostic approach, where the final consumer receives a vector of metrics already populated by sinsp. My question here is, do we want scap_stats_v2 to replace the old scap_stats? If yes, how do we obtain the specific number of drops/events from this agnostic approach? Do we want to keep these specific numbers or the final goal is to expose a set of metrics with a Prometheus endpoint?

First and foremost we are talking about https://github.com/falcosecurity/falco/blob/master/userspace/falco/event_drops.cpp aka Falco internal: syscall event drop that we will call "old drop stats" versus the new metrics Falco option that is also capable of creating an internal rule Falco internal: metrics snapshot containing not just the drop counters but also more metrics.

@leogr starting to summarize a few shortcomings of the old stats from my perspective. At the same time I would be honoring that some adopters prefer to keep the old stats around for longer. Therfore I would be fine keeping it, but also willing to help work out a transition plan.

Cons old drop stats:

  • Old stats report the number of drops within a 1s time frame window which may be a less transparent or intuitive metric versus being able to define your deltas based on what you think has the most meaning (the new metrics framework snapshots the current drop counts at an interval you choose and subsequently you can derive your own deltas the way you prefer)
  • Can generate an unpredictable high amount of logs
  • By default only reports drops when drop percentage is above 10%
  • Enabled by default
  • Not intuitive to customize or even turn off other than adjusting a logging level for Falco rules

Pros old drop stats:

  • Not bound to a fixed metrics interval to send the first alert of drops occurring

incertum avatar Nov 14 '23 00:11 incertum

yeah we need to explore a little bit all the usages of these old drops stats in Falco, to understand if we really need them or if we can just replace them with the new ones

Andreagit97 avatar Nov 20 '23 14:11 Andreagit97

When I asked on slack no one seems to be urgently still needing this.

In the last 3+ debugging sessions I have been involved, we always found the newer metrics feature to provide more actionable insights.

Proposing to introduce a deprecation warning for Falco 0.38 or Falco 0.37 and then follow the formal deprecation cycle? WDYT @falcosecurity/falco-maintainers?

Besides the pros and cons I listed above it will help communicate easier to follow debugging steps and reduce the config surface, effectively making space for new configs that will move the needle in terms of improving Falco's performance and capabilities.

incertum avatar Jan 03 '24 17:01 incertum

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Apr 02 '24 21:04 poiana

/remove-lifecycle stale

Andreagit97 avatar Apr 03 '24 08:04 Andreagit97

/assign

added to my backlog :angel:

Tentatively for /milestone 0.38.0

leogr avatar Apr 03 '24 14:04 leogr