node_exporter
node_exporter copied to clipboard
timex status bitmap fields
This is sort of followup to https://github.com/prometheus/node_exporter/issues/2602
node_timex_status
is a bitfield from adjtimex. Unfortunately Prometheus does not have bitmap math operations, so it's hard to check for specific bits being set in the metric.
As a start, I'd like it to be easier to detect STA_UNSYNC 0x0040
being set.
I don't know if there is prior art in node_exporter
about the best way to represent bitfields.
Naively it might be exploding the single metric to unique metrics, one per flag.
Host operating system: output of uname -a
Linux (REDACTED) 5.19.0-43-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon May 22 13:39:36 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
What did you expect to see?
metrics for each flag
What did you see instead?
a single metric of a bitfield packed into an integer.
I would be happy to help implement this if a design is chosen.
The usual way to translate bitfields is to map each bit to a status string, and then have a series of bool values
Something like
node_timex_status{status="STA_UNSYNC"} 1
Since this is a bit of a breaking change, we would need to add it with a flag like --collector.timex.map-status-bits
.
Could we make it a non-breaking change by using a new name instead?
node_timex_status 8193
# HELP node_timex_status_flags Linux kernel timex.status bits, see adjtimex(2)
# TYPE node_timex_status_flags gauge
node_timex_status_flags{flag="STA_PLL"} 1
node_timex_status_flags{flag="STA_PPSFREQ"} 0
node_timex_status_flags{flag="STA_PPSTIME"} 0
node_timex_status_flags{flag="STA_FLL"} 0
node_timex_status_flags{flag="STA_INS"} 0
node_timex_status_flags{flag="STA_DEL"} 0
node_timex_status_flags{flag="STA_UNSYNC"} 0
node_timex_status_flags{flag="STA_FREQHOLD"} 0
node_timex_status_flags{flag="STA_PPSSIGNAL"} 0
node_timex_status_flags{flag="STA_PPSJITTER"} 0
node_timex_status_flags{flag="STA_PPSWANDER"} 0
node_timex_status_flags{flag="STA_PPSERROR"} 0
node_timex_status_flags{flag="STA_CLOCKERR"} 0
node_timex_status_flags{flag="STA_NANO"} 1
node_timex_status_flags{flag="STA_MODE"} 0
node_timex_status_flags{flag="STA_CLK"} 0
Sure, that works. Since this is a default enabled collector, perhaps we should still add a flag to either leave this off by default, or limit which status flags are exposed. This would limit the cardinality blowup.
The flags haven't changed since 2008, and the field is full, it would have to be another variable in the timex struct.
I implemented the above as a test with recording rules and a bit of math, and it's working as expected, with the 16 extra metrics. We could trim the cardinality by only emitting the non-zero bits (drop the bool
in my snippet below):
- name: node_timex_status_experiment
rules:
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0001*2)) - (node_timex_status % (0x0001))) > bool 0
labels:
flag: "STA_PLL"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0002*2)) - (node_timex_status % (0x0002))) > bool 0
labels:
flag: "STA_PPSFREQ"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0004*2)) - (node_timex_status % (0x0004))) > bool 0
labels:
flag: "STA_PPSTIME"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0008*2)) - (node_timex_status % (0x0008))) > bool 0
labels:
flag: "STA_FLL"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0010*2)) - (node_timex_status % (0x0010))) > bool 0
labels:
flag: "STA_INS"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0020*2)) - (node_timex_status % (0x0020))) > bool 0
labels:
flag: "STA_DEL"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0040*2)) - (node_timex_status % (0x0040))) > bool 0
labels:
flag: "STA_UNSYNC"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0080*2)) - (node_timex_status % (0x0080))) > bool 0
labels:
flag: "STA_FREQHOLD"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0100*2)) - (node_timex_status % (0x0100))) > bool 0
labels:
flag: "STA_PPSSIGNAL"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0200*2)) - (node_timex_status % (0x0200))) > bool 0
labels:
flag: "STA_PPSJITTER"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0400*2)) - (node_timex_status % (0x0400))) > bool 0
labels:
flag: "STA_PPSWANDER"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x0800*2)) - (node_timex_status % (0x0800))) > bool 0
labels:
flag: "STA_PPSERROR"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x1000*2)) - (node_timex_status % (0x1000))) > bool 0
labels:
flag: "STA_CLOCKERR"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x2000*2)) - (node_timex_status % (0x2000))) > bool 0
labels:
flag: "STA_NANO"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x4000*2)) - (node_timex_status % (0x4000))) > bool 0
labels:
flag: "STA_MODE"
- record: node_timex_status_flag
expr: ((node_timex_status % (0x8000*2)) - (node_timex_status % (0x8000))) > bool 0
labels:
flag: "STA_CLK"
Should we have bitmap opérations in PromQL?
@roidelapluie That would be neat..
Should we have bitmap opérations in PromQL?
Care to elaborate @roidelapluie if that is really being discussed?
I am likely dropping a brick here ... but Victoria Metrics did add this (https://github.com/VictoriaMetrics/VictoriaMetrics/commit/db1e62495be0435e2bcd75e77fe4925630679175) a while back and there a quite a few metrics (likely originating from some hardware registers) that are bitwise encoded.
@frittentheke Nobody has submitted an issue or PR to add it. But PromQL does not currently take hex value syntax, only decimal, as metric values must be float64. I think it would be a nice addition to be able to parse hex numbers, and do bitwise operations. It could open the door to making handling of enum values easier.
The metric from node_exporter is float64, and that that's fine here: node_timex_status 8193
PromQL itself permits hex numbers, so it's just adding bitwise ops
PromQL itself permits hex numbers, so it's just adding bitwise ops
@robbat2 Are you certain of this? Maybe you thought of the string literals (https://prometheus.io/docs/prometheus/latest/querying/basics/#string-literals)?
https://prometheus.io/docs/prometheus/latest/querying/basics/#float-literals really clearly shows hex numbers: 0[xX][0-9a-fA-F]+
https://prometheus.io/docs/prometheus/latest/querying/basics/#float-literals really clearly shows hex numbers:
0[xX][0-9a-fA-F]+
Yeah, somehow I did only search for "hex" and was done with that : 🙈
@ Did you see my issue about implementing bitwise operations for PromQL? https://github.com/prometheus/prometheus/issues/14493.
Would be really a cool addition.