contrib
contrib copied to clipboard
varnish5_backend produces errors when updating metrics, `Invalid DS Format`
Hi, I've started using the varnish5_ plugin to graph backend status with the "backend" aspect (e.g. linking the plugin to varnish5_backend
) and I'm seeing errors like this in munin_update.log
:
2021/02/19 20:26:17 [INFO] creating rrd-file for varnish5_backend->VBE_reload_20210219_221640_5516_aegir0_happy: '/var/lib/munin/koumbit.net/cache2.koumbit.net-varnish5_b
ackend-VBE_reload_20210219_221640_5516_aegir0_happy-b.rrd'
2021/02/19 20:26:17 [ERROR] Unable to create '/var/lib/munin/koumbit.net/cache2.koumbit.net-varnish5_backend-VBE_reload_20210219_221640_5516_aegir0_happy-b.rrd': invalid
DS format
2021/02/19 20:26:17 [ERROR] In RRD: Error updating /var/lib/munin/koumbit.net/cache2.koumbit.net-varnish5_backend-VBE_reload_20210219_221640_5516_aegir0_happy-b.rrd: open
ing '/var/lib/munin/koumbit.net/cache2.koumbit.net-varnish5_backend-VBE_reload_20210219_221640_5516_aegir0_happy-b.rrd': No such file or directory
And of course the graphs are not showing since the rrd files are missing.
when I run the plugin, I'm obtaining humongously big values like this, which probably explain why munin doesn't like the values:
# munin-run varnish5_backend
VBE_reload_20210219_221437_19502_aegir0_happy.value 18446744073709551615
[...]
those values are directly output by varnishstat:
# varnishstat -x | grep -A2 aegir0.happy
<name>VBE.reload_20210219_221437_19502.aegir0.happy</name>
<value>18446744073709551615</value>
<flag>b</flag>
according to upstream documentation the field happy
is a bitfield:
https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#vbe-backend-counters
so the varnish5_ plugin should parse the value and massage it into something that rrd/munin can digest... like maybe % successful probes.
... but it doesn't explain what the bits mean :frowning:
if I extrapolate from the ncurses output when calling varnishstat
without arguments, it must be a list of all probes and whether or not they were successful. but we'll need to find out how to interpret the values to make something useful out of them.
ah but if I convert the huge number in binary form, I get exactly the same amount of 1's than the number of "H" signs that varnish displays on the "Happy" line when checking for backend status with varnishadm -S /etc/varnish/secret -T127.0.0.1:6082 backend.list -p
. If we're lucky it's just a binary value for each probe. I'll run some tests to confirm this
with a probe mask that looks like this:
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH----HH
I get a "happy" value of 18446744073709551555 which looks like this in binary:
1111111111111111111111111111111111111111111111111111111111000011
so it's exactly what I thought: it's a binary value for each probe, with the least significant bit representing the most recent probe and the most significant bit representing the oldest probe.
This gives a good overview to start modifying the plugin so that it actually works.
I'm having trouble implementing this though.. the plugin is coded in a super convoluted way, and from what I understand there isn't a good place for transforming data except maybe in the xml_characters
sub. I'll probably need help to implement this since I'm not super well versed in perl
Stale issue message
@kjetilho Since you worked on the varnish5_
plugin you might have a way better understanding of the code than I do.
To TL;DR this issue, when I use the backend
aspect, the state of the backends is not getting graphed correctly and RRD files are not created because of this. From what I could see, the numeric value that's output by varnishstat -x
is a bitmask and the varnish5_
plugin needs to transform that number into something more digestible for graphs.
Do you know where in the plugin we could implement that value modification in order to fix the backend
aspect?
Maybe we could simply discard this bit field value?