Feature request - Bitmap/Bitfield extraction support
A bunch of devices provide status or alarm bytes, where each bit corresponds to a different system or element. I'm currently playing with ViaLite products but I expect that it is a common problem. The desired data here is actually the bit values, such as bit 3, not the byte.
The standard stack handles this poorly.
- The snmp device provides the byte when queried
- snmp_exporter fetches the byte and provides it as a metric gauge
- Prometheus stores it and provides the byte on request
- Grafana and Alertmanager use promql which provides the byte
There is discussion on providing bitwise operators to Prometheus which would allow retrieving specific bits using promql https://github.com/prometheus/prometheus/issues/14493 and VictoriaMetrics has basic bitwise operators. While using bitwise_and() would allow extracting a specific bit it is an ugly solution and requires a query for each bit. Other existing promql query options are even worse.
I believe a much better solution is to solve it in snmp_exporter much like the existing enum system. Each bit would be broken out into a different label. The resulting metric would be something like monStatus{bit=LaserStatus} 1
This would require some work to the snmp exporter code, and support for the generator. I believe it is inline with the design philosophy of the snmp_exporter, the functionality is conceptually similar to the existing scale and offset options. As it would be a new configuration option there shouldn't be any backwards compatibility concerns.
I am happy to prepare a design proposal and pull request, but before I did the work I would like some input on if this is desired and any issues which are obvious to the maintainers but probably not to me.
Related: https://github.com/prometheus/snmp_exporter/issues/855.
I think it would be great to have this be an option in the generator. We can extract bits similarly to how we do EnumAsStateSet.
Pulling draft requirements/design together.
Using upsHighPrecBatteryPackStatus and lgpPduEntrySysStatus as well documented example fields.
The first is of type OCTET STRING, the second is Unsigned32, a variety of types must be supported. Bits are in big-endian format, this is explicit in the lgpPduEntrySysStatus description and the SNMP standard, if a bit-field is little endian then the user can define the bit field in reverse to obtain desired functionality.
upsHighPrecBatteryPackStatus OBJECT-TYPE
SYNTAX OCTET STRING
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The battery status for the pack only.
bit 0 Disconnected
bit 1 Overvoltage
bit 2 NeedsReplacement
bit 3 OvertemperatureCritical
bit 4 Charger
bit 5 TemperatureSensor
bit 6 BusSoftStart
bit 7 OvertemperatureWarning
bit 8 GeneralError
bit 9 Communication
bit 10 DisconnectedFrame
bit 11 FirmwareMismatch
lgpPduEntrySysStatus OBJECT-TYPE
SYNTAX Unsigned32
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"This value represents a bit-field of the various operational
states of the PDU. The value is a logical OR of all of the
following potential states of the PDU. Note the bit-position
is given parenthetically next to the operational state in the
description below. The bit position is assumed to be a big-endian
format (least significant digit is the right-most digit). The
state is present in the PDU when the bit is on (value = 1).
normalOperation(1)
The PDU is operating normally with no active warnings or alarms.
startUp(2)
The PDU is in the startup state (initializing). Control
and monitoring operations maybe inhibited or unavailable
while the PDU is in this state. This state will clear
automatically when the PDU(s) are fully initialized and
ready to accept control and monitoring commands.
normalWithWarning(8)
The PDU is operating normally with one or more active
warnings. Appropriate personnel should investigate the
warning(s) as soon as possible and take appropriate action.
normalWithAlarm(16)
The PDU is operating normally with one or more active
alarms. Appropriate personnel should investigate the alarm(s)
as soon as possible and take appropriate action.
abnormalOperation(32)
The PDU is operating abnormally. That is there is some
failure within the system that is unexpected under normal
operating conditions. Appropriate personnel should investigate
the cause as soon as possible. The normal functioning of
the system is likely inhibited.
- The desired output is a label for each bit, much like is produced by enumAsStateSet
- There will be a new type option for snmp.yml, BitsAsStateSet
- Labels must be specified for each bit, in the new bit_values field
- Bits will not all be mandatory, if a bit is not defined then its value will be ignored, a label will not be created
- The first bit is specified as bit zero
- Label names must meet the prometheus rules but not the best practices
- There will be no facility for bit detailed descriptions, prometheus provides no label description option
Example SNMP description
name: upsHighPrecBatteryPackStatus
oid: 1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.6
type: BitsAsStateSet
help: The battery status for the pack only - 1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.6
indexes:
- labelname: upsHighPrecBatteryPackIndex
type: gauge
- labelname: upsHighPrecBatteryCartridgeIndex
type: gauge
bit_values:
0: Disconnected
1: Overvoltage
2: NeedsReplacement
3: OvertemperatureCritical
4: Charger
5: TemperatureSensor
6: BusSoftStart
7: OvertemperatureWarning
8: GeneralError
9: Communication
10: DisconnectedFrame
11: FirmwareMismatch
- name: lgpPduEntrySysStatus
oid: 1.3.6.1.4.1.476.1.42.3.8.20.1.25
type: BitsAsStateSet
help: This value represents a bit-field of the various operational states of
the PDU - 1.3.6.1.4.1.476.1.42.3.8.20.1.25
indexes:
- labelname: lgpPduEntryIndex
type: gauge
lookups:
- labels:
- lgpPduEntryIndex
labelname: lgpPduEntrySysAssignLabel
oid: 1.3.6.1.4.1.476.1.42.3.8.20.1.15
type: DisplayString
bit_values:
0: normalOperation
1: startUp
3: normalWithWarning
4: normalWithAlarm
5: abnormalOperation
@SuperQ is is possible to reuse the Bits type for this?
The snmp Bits type is handled by the bits function at https://github.com/prometheus/snmp_exporter/blob/main/collector/collector.go#L734
This function does almost exactly what I would like to do.
However I believe it is constrained to the OctetString SNMP type. Specifically I believe it is constrained to SnmpPdu objects which have an array value, which is only the SNMP 0x04 BER tag, which is the OCTET STRING and BITS types. Invoking the function by overriding a gauge type to Bits does not work, I believe because my underlying type was INTEGER which causes the array length check to fail.
I could be very wrong, I'm new to SNMP and golang.
Modifying the bits function to support direct values as well as arrays will probably achieve the desired outcome.
It does confuse things with the underlying SNMP BITS type though.
This sounds like a good feature. Here's another example use case, from Ross OpenGear
openGearFrameStatus OBJECT-TYPE
SYNTAX Integer32
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"The overall hardware status of the frame. Zero means all OK.
Any nonzero value indicates an error condition. The status value
is a bitfield, where each bit represents a different error:
0x0001 - the fan door has been open too long.
0x0002 - psu fuse blown.
0x0004 - psu fault.
0x0008 - psu is overloaded
0x0010 - psu fan has stalled
0x0020 - a frame-door fan has stalled
0x0040 - error condition(s) reported by card(s)
The following additional bits apply to OG3-FR high power frame only:
0x0080 - the reference card is missing/faulty
0x0100 - psu AC missing
0x0200 - psu power off
0x0400 - psu is overloaded
0x0800 - card warning
"
::= { openGearFrameHardwareEntry 1 }