bind_exporter icon indicating copy to clipboard operation
bind_exporter copied to clipboard

bind_boot_time_seconds appears to be shifting strangely

Open rootwyrm opened this issue 4 years ago • 6 comments

image

The issue appears to be the use of max(node_time_seconds{instance=~"$node:.*"}) which appears to not work in Prometheus 2.19, and a possible change in how Bind 9.16 reports uptime. The result is that changing the queries to time() - max(bind_boot_time_seconds{instance=~"$node:.*"}) produces sensible seeming results, but these are actually still off by an order of magnitude.

i.e. a Bind 9.16 reported boot time of 2020-07-14T21:10:48.999Z, with a current time of 2020-07-14T22:11:56.299Z will report incorrectly with the above, claiming 5.8 hours. I thought I had found the order of magnitude error, but then I noticed that something is still wrong because it wasn't updating correctly. That's when I noticed that bind_boot_time_seconds was moving. It went from 1594766106 to 1594740334, which is definitely not correct.

The actual Bind statistics do not reflect a change in the corresponding XML or JSON.

rootwyrm avatar Jul 14 '20 22:07 rootwyrm

The exporter is only taking what bind reports in the boot-time XML field. It's a reasonably simple XML parse to Go time.Time.

Without examining the raw metric data, it's hard to say what's going on.

SuperQ avatar Jul 15 '20 08:07 SuperQ

That's what I'm saying: somehow it's mangling the raw metric. I am positive of this. I checked the raw. The raw in the XML is correct and more importantly, does not change. Yet the export is changing the value from the XML in strange ways. And this is reflected in the raw data from the exporter.

rootwyrm avatar Jul 15 '20 18:07 rootwyrm

The only thing we could do here is to build a version with logging of the raw XML data to see what's returned. Without some concrete proof that the exporter is doing something, there's nothing we can do.

SuperQ avatar Jul 15 '20 19:07 SuperQ

I definitely agree; this is going to need some XML dumping. However, I don't see any way to do that and frankly I have zero experience working in Go (so frankly, I suspect my attempt would mangle output.)

Also probably safe to go ahead and drop the xml.v2 channel completely as that was fully discontinued with 9.10 (which went fully EOL in 2018.) Maybe that would make debugging easier as well.

rootwyrm avatar Jul 16 '20 17:07 rootwyrm

That reminds me, we should add support for the new json format.

https://github.com/prometheus-community/bind_exporter/issues/82

SuperQ avatar Jul 16 '20 17:07 SuperQ

This is intriguing. Whilst working on a very trivial patch to eliminate ioutil.ReadAll when unmarshalling the XML, I hit a test failure which I haven't yet resolved. However, in the failed test output, there are bind_boot_time_seconds timestamps that are slightly shifted vs. what the tests expect. I cannot find anything in the test fixtures to explain this.

dswarbrick avatar Oct 15 '20 14:10 dswarbrick