node_exporter
node_exporter copied to clipboard
xvda[0-9]+ disks (not partitions) are ignored by default
Host operating system: output of uname -a
Linux 5.3.18-24.64-default
node_exporter version: output of node_exporter --version
node_exporter, version (branch: , revision: )
build user:
build date:
go version: go1.14.7
node_exporter command line flags
none
Are you running node_exporter in Docker?
no
What did you do that produced an error?
default config does not collect disk stats
What did you expect to see?
all disk devices should be collected by default
What did you see instead?
No disk stats
Since #13 got merged, xvda[0-9]+ are ignored. However, it is not correct to assume that those are always partitions. Xen allows a machine to inform "virtual partition", which are mapped to dedicated devices (it makes expanding a filesystem trivial).
This is a normal xvda-disk with partitions (possibly on what that PR was based on):
# ls /sys/block/xvda* -l
lrwxrwxrwx 1 root root 0 Mai 17 14:46 /sys/block/xvda -> ../devices/vbd-51712/block/xvda
# grep xvda /proc/diskstats
202 0 xvda 26438 52 1897369 17235 4263982 53800 47477992 3542063 0 103104 3557922
202 1 xvda1 22 0 176 7 0 0 0 0 0 7 7
202 2 xvda2 118 0 13522 94 129 3 30326 274 0 42 367
202 3 xvda3 26268 52 1880583 17122 4263853 53797 47447666 3541789 0 103052 3557536
And this is a "virtual-partitions" machine:
# ls /sys/block/xvd* -d1
/sys/block/xvda1
/sys/block/xvda2
# grep xvda /proc/diskstats
202 1 xvda1 66701 3 13340088 570748 1833 254 308680 72 0 32440 674096
202 2 xvda2 101 0 4448 0 0 0 0 0 0 4 4
xvda[0-9]+ should not be excluded by default only based on their name. At least in Linux, this is the easy but wrong way to check if a device is a partition or a disk. node_exporter could check if /sys/block/
If collector.diskstats.ignored-devices, by default, only excludes partitions, I think a new option like: "collector.diskstats.ignore-partitions (default: true)" should be used instead, leaving "ignored-devices" empty by default.
As a suggestion, wouldn't it be better to iterate over "/sys/block/*/stat"? It is mostly the same format. It has the same order/meaning but it looks like it uses fixed-length fields (safer? faster?).
Would a PR changing from /proc/diskstats to /sys/block/*/stat be accepted?
Parsing /sys/block/*/stat
might be a good idea. But some users do want partition-level stats. We use the regexp method to provide the most generic way to filter exactly what the user wants to know without second guessing them.
It's also possible to scan /sys/block/$DEV/$DEV*/stat
for partition data. But either way, this would require completely re-designing the flag layout for this collector. It also changes the consistency of the timestamps that we get from parsing /proc/diskstats
This would be a breaking change for the existing behavior. It could be implemented as a completely new collector, this way existing users would continue to see the existing behavior. We could call it --collector.blockdev
or something. It could become the default implementation for diskstats
in 2.x.
In order to do this, we first need a /sys/block
parser in https://github.com/prometheus/procfs.
It's also possible to scan
/sys/block/$DEV/$DEV*/stat
for partition data. But either way, this would require completely re-designing the flag layout for this collector. It also changes the consistency of the timestamps that we get from parsing/proc/diskstats
The "timestamp consistency" is not guaranteed as /proc/diskstats does not use locks. It is expected, although not frequent, that partitions metrics sum does not match disk one. Reading individual files might increase this discrepancy but I think not to a unacceptable magnitude.
This would be a breaking change for the existing behavior. It could be implemented as a completely new collector, this way existing users would continue to see the existing behavior. We could call it
--collector.blockdev
or something. It could become the default implementation fordiskstats
in 2.x.In order to do this, we first need a
/sys/block
parser in https://github.com/prometheus/procfs.
I created a PR proposing the change in two steps. The first one will not change the expected output (except for a minor delay between devices). The second one might break when someone was collecting partition data.