falco
falco copied to clipboard
Brainstorming: integrate Prometheus /metrics handler in Falco
Motivation
Since falco-exporter is a great tool to expose prometheus metrics through gRPC, I think there are some caveats to use it. Our (w/ @developer-guy @f9n) some motivations to create this issue is that the following ones:
- To reduce unnecessary gRPC communication: we have to enable gRPC output feature in the configuration. ^1
- To do not manage mTLS certs: to connect gRPC over unix socket, we have to manage additionally CA Certs as described under connection options
- To manage one single helm chart: there are two different helm charts that we have to manage and maintain in our infrastructure, which are falco and falco-exporter
- To reduce complexity: simplicity is always better, to easily see the metrics in the Grafana, requires additional work to implement in current design
-
To make monitoring easier on local: we can instantly check the metrics using
/metrics
endpoint in some troubleshooting scenarios
$ kubectl port-forward svc/falco 8756
$ curl localhost:8756/metrics
I think that implementing a Prometheus metrics in C++ would not be as easy as it looks like. Here is an example metric server that currently using in fluentbit.
Feature
This feature already clearly proposed in #421 and #530 and implemented in the projects.
Alternatives
- https://github.com/falcosecurity/falco-exporter
- https://github.com/falcosecurity/falcosidekick#prometheus
Additional context
By throwing this issue actually does NOT mean that we should achieve the falco-exporter and provide this metrics in only falco by built-in. What I want to say mostly are the following ones to clarify and understand the design:
- How we can make this process simpler by removing unnecessary dependencies?
- What is the correct way to do?
- Why did not falco provide
/metrics
endpoint in the first place? - What is the motivation of behind this design: separating metrics module into another project?
Waiting your feedback!
Kind ping here 🎗️
We can take a look Fluent Bit's cmt_gauge.h for C++ implementation.
FYI Falcosidekick exposes prom metrics exactly like you want, it can be managed with the same chart and it does not require mtls.
See: https://github.com/falcosecurity/falcosidekick#prometheus
FYI Falcosidekick exposes prom metrics exactly like you want, it can be managed with the same chart and it does not require mtls.
Thanks, just noticed that! AFAICS, it exposes 3 metrics: falco, inputs, outputs
It could be very useful to use this on cloud or small clusters. But someone who actively use a log aggregator solution on cluster, Fluent Bit for example, might not want to install and maintain an external app in cluster. I mean, you'd have to do an additional HTTP request for each event times cluster count. I am not so sure how this fits in a pull-based system.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
/milestone 0.36.0
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
cross-linking https://github.com/falcosecurity/libs/issues/1463
cc @incertum
/assign