blackbox_exporter icon indicating copy to clipboard operation
blackbox_exporter copied to clipboard

feature request : md5 on a page

Open sachaz opened this issue 7 years ago • 18 comments

Hi

All is in the title, this is a feature request for a new probe: md5 check to a specified value to verify the integrity of a page.

sachaz avatar Aug 27 '18 14:08 sachaz

There's already regex support to verify that a http response contains given output, is this not sufficient for your use case?

brian-brazil avatar Aug 27 '18 14:08 brian-brazil

Regex feature is really cool (thanks for this) but it is not enough to verify the page is not changed, that's not the same thing to verify the integrity of a page like with a checksum.

sachaz avatar Aug 28 '18 16:08 sachaz

What are you actually trying to test here?

The presumption is that you've got some form of web app whose content changes over time, so you want to look for e.g. key phrases rather than exact content which can change from release to release.

brian-brazil avatar Aug 28 '18 16:08 brian-brazil

Ok let's have some concretes examples :) I have some web app content giving several results of tests which have to be always the same. There is too much to do a regex, a md5 in this case is easier. Another example could be a to verify the integrity of a site pages, the developers can provide a md5 for the pages and the test can validate your site is not defaced.

sachaz avatar Aug 28 '18 16:08 sachaz

Checking the integrity of an entire website is a bit out of scope, this exporter is more for determining if a website is working at all - and not something you want to be doing once a minute. A tool specifically designed for this may be better here.

brian-brazil avatar Aug 28 '18 16:08 brian-brazil

It might be an option to expose the sha256sum of the page an info metric.

SuperQ avatar Aug 28 '18 22:08 SuperQ

That could vary from scrape to scrape, and thus would be too high cardinality.

brian-brazil avatar Aug 29 '18 06:08 brian-brazil

Let's be clear: the feature is requested to validate an http web page not to check a site.

sachaz avatar Sep 04 '18 09:09 sachaz

I like that feature. You could even expose it just as metric value. I think it would be useful for monitoring all kind of assets for consistency. e.g use it to check if your public key on 3rd party service wasn't modified oder your shasum file for a binary release on a package mirror etc.

discordianfish avatar Sep 04 '18 09:09 discordianfish

@discordianfish absolutely

sachaz avatar Sep 08 '18 12:09 sachaz

I've taken a look into implementing this, and based on the comments I see the following options:

  • Add a setting to check for a match in a set of SHA-256 checksums, similar to the regex checks. This only really works for static data, as the probe config would need to be updated for every update on the page.
  • Add a setting to export the SHA-256 checksum as a metric with a note about cardinality. This only works for fairly static changing data, because of the cardinality.
  • Export the CRC32 of the page as a metric value. This works for any page, but cannot be used for security purposes. Just using probe_http_last_modified_timestamp_seconds is probably better.

silkeh avatar Oct 12 '19 15:10 silkeh

Why not SHA as metric value? That's what I would do. But it looks like @brian-brazil doesn't want it anyway so this issue should probably be closed.

discordianfish avatar Oct 16 '19 09:10 discordianfish

Why not SHA as metric value? That's what I would do.

Because metric values are double-precision floating point (float64), and a SHA is >64 bits. 64 bits is not sufficient to ensure that the content has not been tampered with. This limitation is why I suggested CRC32 above.

Concerning label values: this will result in high cardinality (see caution in the documentation), so it would need to be opt-in (and even then would not be a great idea). Play around with this branch if your really want to try it.

silkeh avatar Oct 16 '19 10:10 silkeh

A HTTP response does not have to be a text. It can be a binary data. In this case a regex will not work. And a content checksum seems to be a good idea for such data. In my case I want to check that a content of the dynamically generated PNG file does not change. Currently I can only check status code and content length.

platan avatar Oct 16 '19 19:10 platan

In this case a regex will not work.

Why do you think that? What problems did you encounter?

brian-brazil avatar Nov 21 '19 17:11 brian-brazil

I've taken a look into implementing this, and based on the comments I see the following options:

  • Add a setting to check for a match in a set of SHA-256 checksums, similar to the regex checks. This only really works for static data, as the probe config would need to be updated for every update on the page.

This approach would be a perfect fit for monitoring the integrity of security.txt files and PGP public keys linked from there.

znerol avatar Apr 19 '23 06:04 znerol

Yeah I still think this would be make a good feature. Now with Brian not being maintainer anymore, I think it's likely that this would get merged /cc @roidelapluie

discordianfish avatar Apr 26 '23 10:04 discordianfish