feature request : md5 on a page
Hi
All is in the title, this is a feature request for a new probe: md5 check to a specified value to verify the integrity of a page.
There's already regex support to verify that a http response contains given output, is this not sufficient for your use case?
Regex feature is really cool (thanks for this) but it is not enough to verify the page is not changed, that's not the same thing to verify the integrity of a page like with a checksum.
What are you actually trying to test here?
The presumption is that you've got some form of web app whose content changes over time, so you want to look for e.g. key phrases rather than exact content which can change from release to release.
Ok let's have some concretes examples :) I have some web app content giving several results of tests which have to be always the same. There is too much to do a regex, a md5 in this case is easier. Another example could be a to verify the integrity of a site pages, the developers can provide a md5 for the pages and the test can validate your site is not defaced.
Checking the integrity of an entire website is a bit out of scope, this exporter is more for determining if a website is working at all - and not something you want to be doing once a minute. A tool specifically designed for this may be better here.
It might be an option to expose the sha256sum of the page an info metric.
That could vary from scrape to scrape, and thus would be too high cardinality.
Let's be clear: the feature is requested to validate an http web page not to check a site.
I like that feature. You could even expose it just as metric value. I think it would be useful for monitoring all kind of assets for consistency. e.g use it to check if your public key on 3rd party service wasn't modified oder your shasum file for a binary release on a package mirror etc.
@discordianfish absolutely
I've taken a look into implementing this, and based on the comments I see the following options:
- Add a setting to check for a match in a set of SHA-256 checksums, similar to the regex checks. This only really works for static data, as the probe config would need to be updated for every update on the page.
- Add a setting to export the SHA-256 checksum as a metric with a note about cardinality. This only works for fairly static changing data, because of the cardinality.
- Export the CRC32 of the page as a metric value.
This works for any page, but cannot be used for security purposes. Just using
probe_http_last_modified_timestamp_secondsis probably better.
Why not SHA as metric value? That's what I would do. But it looks like @brian-brazil doesn't want it anyway so this issue should probably be closed.
Why not SHA as metric value? That's what I would do.
Because metric values are double-precision floating point (float64), and a SHA is >64 bits. 64 bits is not sufficient to ensure that the content has not been tampered with. This limitation is why I suggested CRC32 above.
Concerning label values: this will result in high cardinality (see caution in the documentation), so it would need to be opt-in (and even then would not be a great idea). Play around with this branch if your really want to try it.
A HTTP response does not have to be a text. It can be a binary data. In this case a regex will not work. And a content checksum seems to be a good idea for such data. In my case I want to check that a content of the dynamically generated PNG file does not change. Currently I can only check status code and content length.
In this case a regex will not work.
Why do you think that? What problems did you encounter?
I've taken a look into implementing this, and based on the comments I see the following options:
- Add a setting to check for a match in a set of SHA-256 checksums, similar to the regex checks. This only really works for static data, as the probe config would need to be updated for every update on the page.
This approach would be a perfect fit for monitoring the integrity of security.txt files and PGP public keys linked from there.
Yeah I still think this would be make a good feature. Now with Brian not being maintainer anymore, I think it's likely that this would get merged /cc @roidelapluie