bind_exporter
bind_exporter copied to clipboard
bind_boot_time_seconds appears to be shifting strangely
The issue appears to be the use of max(node_time_seconds{instance=~"$node:.*"})
which appears to not work in Prometheus 2.19, and a possible change in how Bind 9.16 reports uptime. The result is that changing the queries to time() - max(bind_boot_time_seconds{instance=~"$node:.*"})
produces sensible seeming results, but these are actually still off by an order of magnitude.
i.e. a Bind 9.16 reported boot time of 2020-07-14T21:10:48.999Z, with a current time of 2020-07-14T22:11:56.299Z will report incorrectly with the above, claiming 5.8 hours. I thought I had found the order of magnitude error, but then I noticed that something is still wrong because it wasn't updating correctly. That's when I noticed that bind_boot_time_seconds
was moving. It went from 1594766106
to 1594740334
, which is definitely not correct.
The actual Bind statistics do not reflect a change in the corresponding XML or JSON.
The exporter is only taking what bind reports in the boot-time
XML field. It's a reasonably simple XML parse to Go time.Time
.
Without examining the raw metric data, it's hard to say what's going on.
That's what I'm saying: somehow it's mangling the raw metric. I am positive of this. I checked the raw. The raw in the XML is correct and more importantly, does not change. Yet the export is changing the value from the XML in strange ways. And this is reflected in the raw data from the exporter.
The only thing we could do here is to build a version with logging of the raw XML data to see what's returned. Without some concrete proof that the exporter is doing something, there's nothing we can do.
I definitely agree; this is going to need some XML dumping. However, I don't see any way to do that and frankly I have zero experience working in Go (so frankly, I suspect my attempt would mangle output.)
Also probably safe to go ahead and drop the xml.v2
channel completely as that was fully discontinued with 9.10 (which went fully EOL in 2018.) Maybe that would make debugging easier as well.
That reminds me, we should add support for the new json format.
https://github.com/prometheus-community/bind_exporter/issues/82
This is intriguing. Whilst working on a very trivial patch to eliminate ioutil.ReadAll
when unmarshalling the XML, I hit a test failure which I haven't yet resolved. However, in the failed test output, there are bind_boot_time_seconds
timestamps that are slightly shifted vs. what the tests expect. I cannot find anything in the test fixtures to explain this.