blackbox_exporter Keep failed result history per target

Keep failed result history per target

Open igorwwwwwwwwwwwwwwwwwwww opened this issue 3 years ago • 9 comments

Currently the result history is stored globally across all probes. This means that if there is one target that is constantly failing, and one that only fails occasionally, the failing one will kick the rare one out of the result history.

So when we then come in and try to understand why that rare failure occurred, it is likely gone from the history.

If we were to track these separately per target, it'd be much easier to figure out what happened, without having to increase the history limit.

Feb 19 '21 13:02 igorwwwwwwwwwwwwwwwwwwww

If I understand correctly what you are saying, you want to do some relabeling in Prometheus. For example, as shown here: https://www.robustperception.io/what-percentage-of-time-is-my-service-down-for

With that particular configuration each target will get its own "instance" value, and each module will get its own "job", so you can query the job/instance combination.

Is that what you are trying to do?

Feb 19 '21 19:02 mem

I think this is more about the history shown in the UI.

I think it is really difficult because we can have an infinite number of targets, it is upon the requester to ask.

Feb 19 '21 19:02 roidelapluie

Oh, I understand.

I think you want to capture and upload blackbox_exporter logs, so that you can see the failure (e.g. probe_success) and go to the corresponding logs to identify the issue. You can use e.g. Loki for that.

Feb 22 '21 13:02 mem

We keep some amount of debug logs in memory in the exporter, so it's visible in the UI.

The difficulty we have is that we have a medium number of blackbox targets, around a couple hundred, broken down into 5 or so modules.

We can enable longer history, but the UI isn't organized by module or target, so it's hard to follow.

Feb 22 '21 21:02 SuperQ

The other issue is there's no option for the blackbox exporter to log failures only. So you can only run at debug level, which is too noisy.

Having an option like --probe.log-failures would make the logs to Loki or whatever more useful.

Feb 22 '21 21:02 SuperQ

I developed a proxy to do this. The proxy takes /metrics call, add ?debug=true to the query, passes it to blackbox_exporter, saves the logs and metrics in a CSV file, and returns the metrics to Prometheus.

Feb 22 '21 21:02 roidelapluie

(This is YOLO quality so it's not on github)

Feb 22 '21 21:02 roidelapluie

I think we can easily implement a flag for logging errors. Maybe @igorwwwwwwwwwwwwwwwwwwww would be interested in implementing this?

Feb 22 '21 21:02 SuperQ

I'd merge that.

Feb 22 '21 21:02 roidelapluie

blackbox_exporter blackbox_exporter copied to clipboard

Keep failed result history per target

blackbox_exporter
blackbox_exporter copied to clipboard