node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

node_md_* does not show RAID syncing

Open marcinhlybin opened this issue 5 years ago • 3 comments

Host operating system: output of uname -a

Linux barman-01 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.0.0 (branch: HEAD, revision: b9c96706a7425383902b6143d097cf6d7cfd1960)
  build user:       root@3e55cc20ccc0
  build date:       20200526-06:01:48
  go version:       go1.14.3

node_exporter command line flags

Excerpt from the systemd service:

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=10.10.90.1:9100 \
  --collector.diskstats.ignored-devices='^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\\d+n\\d+p)\\d+$' \
  --collector.filesystem.ignored-mount-points='^/(sys|proc|dev|run)($|/)' \
  --collector.netdev.device-blacklist='^lo$' \
  --collector.textfile.directory=/var/lib/prometheus/node_exporter \
  --collector.netstat.fields='(.*)' \
  --collector.vmstat.fields='(.*)' \
  --collector.interrupts \
  --collector.processes \
  --collector.systemd \
  --collector.tcpstat

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

metrics:

root@barman-01 ~ # curl -Ss 10.10.90.1:9100/metrics|grep _md_
# HELP node_md_blocks Total number of blocks on device.
# TYPE node_md_blocks gauge
node_md_blocks{device="md0"} 1.046528e+06
node_md_blocks{device="md1"} 1.9530507264e+10
# HELP node_md_blocks_synced Number of blocks synced on device.
# TYPE node_md_blocks_synced gauge
node_md_blocks_synced{device="md0"} 1.046528e+06
node_md_blocks_synced{device="md1"} 1.9530507264e+10
# HELP node_md_disks Number of active/failed/spare disks of device.
# TYPE node_md_disks gauge
node_md_disks{device="md0",state="active"} 4
node_md_disks{device="md0",state="failed"} 0
node_md_disks{device="md0",state="spare"} 0
node_md_disks{device="md1",state="active"} 4
node_md_disks{device="md1",state="failed"} 0
node_md_disks{device="md1",state="spare"} 0
# HELP node_md_disks_required Total number of disks of device.
# TYPE node_md_disks_required gauge
node_md_disks_required{device="md0"} 4
node_md_disks_required{device="md1"} 4
# HELP node_md_state Indicates the state of md-device.
# TYPE node_md_state gauge
node_md_state{device="md0",state="active"} 1
node_md_state{device="md0",state="inactive"} 0
node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0
node_md_state{device="md1",state="active"} 1
node_md_state{device="md1",state="inactive"} 0
node_md_state{device="md1",state="recovering"} 0
node_md_state{device="md1",state="resync"} 0

mdstat:

root@barman-01 ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid6 sdb2[1] sdc2[2] sdd2[3] sda2[0]
      19530507264 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      [==============>......]  check = 73.4% (7173181200/9765253632) finish=273.0min speed=158203K/sec
      bitmap: 2/73 pages [8KB], 65536KB chunk

md0 : active raid1 sdb1[1] sdc1[2] sdd1[3] sda1[0]
      1046528 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>

What did you expect to see?

I expected to see difference between node_md_blocks and node_md_blocks_synced values. Currently values are the same although /proc/mdstat shows syncing.

node_md_blocks{device="md1"} 1.9530507264e+10
node_md_blocks_synced{device="md1"} 1.9530507264e+10

I expected recovering and resync metrics set to 1:

node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0

marcinhlybin avatar Oct 21 '20 11:10 marcinhlybin

I took a look at the mdadm details and it seems that the array is in checking state. I think it would be a good idea to add this state to the metrics.

However when I check sysfs I can see following syncing information. In fully operational state this file says none. I think this value should reflect node_md_blocks_synced metrics:

root@barman-01 ~ # cat /sys/block/md1/md/sync_completed
15295250000 / 19530507264

mdadm details:

root@barman-01 /sys/block/md1/md # mdadm --detail /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Tue Apr 28 10:56:42 2020
        Raid Level : raid6
        Array Size : 19530507264 (18625.74 GiB 19999.24 GB)
     Used Dev Size : 9765253632 (9312.87 GiB 9999.62 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Oct 21 13:43:29 2020
             State : active, checking
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

      Check Status : 78% complete

              Name : rescue:1
              UUID : d11ed962:b8848438:41411ae3:2e973bf6
            Events : 110843

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2

marcinhlybin avatar Oct 21 '20 11:10 marcinhlybin

There's a procfs PR to improve mdadm parsing. (https://github.com/prometheus/procfs/pull/329) But, as @dswarbrick mentioned, we should probably add parsing for the new sysfs files.

SuperQ avatar Oct 24 '20 16:10 SuperQ

Thanks @dswarbrick for implementing sysfs files parsing! Is there a chance this change will be adopted in node_exporter for the foreseeable future?

pznamensky avatar May 19 '23 14:05 pznamensky