crowdsec-bouncer-traefik-plugin icon indicating copy to clipboard operation
crowdsec-bouncer-traefik-plugin copied to clipboard

[FEATURE] add metrics e.g. for Prometheus/Grafana

Open schenklklopfer opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe. 🐛 Before using your fancy tool I used fbonalair/traefik-crowdsec-bouncer or the newer thespad/traefik-crowdsec-bouncer. Those had at least one metrics endpoint: crowdsec_traefik_bouncer_processed_ip_total But the more interesting infromation - like how many reqeusts has been blocked/passed and maybe the ratio were not there. I used to parse the logoutout with complicated LogQL quieries in Loki.

The only way to do this here is to set logLevel ot DEBUG and rewrite the LogQL queries to get the infromation from the debug-log. But I am afraid the DEBUG-loglevel might affect performance of the plugin, so maybe this is not the best way.

Describe the solution you'd like ✨ A way to get standarized metrics from this plugin maybe in a standard format like the Prometheus Metrics

I can imagine of metrics like:

  • crowdsec_traefik_bouncer_plugin_requests_blocked
  • crowdsec_traefik_bouncer_plugin_requests_passed
  • crowdsec_traefik_bouncer_plugin_ips_currently_blocked
  • crowdsec_traefik_bouncer_plugin_ips_ever_blocked (since startup)
  • crowdsec_traefik_bouncer_plugin_crowdsecMode
  • crowdsec_traefik_bouncer_plugin_iscrowdsecstreamhealthy
  • crowdsec_traefik_bouncer_plugin_updatefailure_count

If used:

  • crowdsec_traefik_bouncer_plugin_redis_stats

Additional context To visually realize dashbaords in e.g. Grafana to see how much the system is protecting the systems. Like this: grafik

schenklklopfer avatar May 22 '24 14:05 schenklklopfer

Hi @schenklklopfer,

We've thought about this but we believe it is not the best place to write an exporter in a plugin.
We'd like to have thoses metrics of course, but a plugin, like a middleware is made to take a query, do some stuff with it (in our case block, captcha) and/or let the query continue.

Dev an exporter would mean to have persistent storage for the stats and lots of write, updates in the cache (memory or redis).
I believe it could impact perf as well as using debug mode I would also mean we would have to intercept the request, check the path for a /metrics endpoint, read the cache, format it conditionaly for each request which would made the core code more complexe than it already is.

Crowdsec provide native metrics https://docs.crowdsec.net/docs/observability/prometheus. They are metrics from the LAPI, and parsing logs but it contains the number of action taken for instance. Even without debug logs and with the help of access logs you can count the number of 403 returned by IP.

In my opinion, once you've banned an IP for like 4 hours for a web scan for instance, no matter if it tries 1, 10 or thousands of requests if they are blocked.

I will let this issue open, to see if there is more people that share this need, and we'll advise then

mathieuHa avatar Jun 06 '24 16:06 mathieuHa

I think as long as it does not require a tremendous amount of effort or/and sacrifice to performance, it could be quite useful. For someone setting it up for the first time it would be helpful to know if crowdsec is actually blocking IPs from the community blocklist and that you have indeed set up Crowdsec correctly, afaik there is no way to check for sure what the plugin itself blocked at the moment. I have crowdsec plugin sitting in a middleware chain before geoblock, and I have no way of knowing if that 403 was from crowdsec or geoblock. Yes, you could try to pretend that you are an attacker from your phone, but what if something breaks during an update, or after a config change, and you don't bother doing that? But it would be noticeable in metrics, which would give some users a peace of mind. Also, I see there are some other remediation implementations adding metrics, firewall has it already, and here it seems to be in the works too: https://github.com/crowdsecurity/lua-cs-bouncer/pull/80

tannisroot avatar Jan 31 '25 18:01 tannisroot

I believe the way forward here is to use the LAPI metrics endpoint to feed crowdsec with remediation metrics.

https://crowdsecurity.github.io/api_doc/index.html?urls.primaryName=LAPI#/

https://docs.crowdsec.net/docs/next/observability/usage_metrics/

david-garcia-garcia avatar Feb 06 '25 13:02 david-garcia-garcia

I had a look on the Crowdsec Prometheus Metrics infos. There is only one Metric that copes about bouncer: "cs_lapi_bouncer_requests_total" As I use the bouncer in "stream" mode, it makes one request every 5 seconds.

...
          updateIntervalSeconds: 5
          crowdsecMode: stream
...

So the metric shows an increment every 5 seconds. This is not helpful.

Also looking on 403 HTTP Codes is not helpful as @tannisroot mentioned above, cause those codes can be triggered by others too.

schenklklopfer avatar Feb 07 '25 10:02 schenklklopfer

Hey, to handle this case specifically, you can use this env variables RemediationHeadersCustomName that will set a header when the bouncer block a request. Then, a 403 + the header and you have all the information you need.

maxlerebourg avatar Feb 07 '25 10:02 maxlerebourg

Also interested in having remediation metrics appearing in the console. Thanks!

tiangao88 avatar May 30 '25 07:05 tiangao88