node_exporter
                                
                                 node_exporter copied to clipboard
                                
                                    node_exporter copied to clipboard
                            
                            
                            
                        node_md_* does not show RAID syncing
Host operating system: output of uname -a
Linux barman-01 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of node_exporter --version
node_exporter, version 1.0.0 (branch: HEAD, revision: b9c96706a7425383902b6143d097cf6d7cfd1960)
  build user:       root@3e55cc20ccc0
  build date:       20200526-06:01:48
  go version:       go1.14.3
node_exporter command line flags
Excerpt from the systemd service:
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=10.10.90.1:9100 \
  --collector.diskstats.ignored-devices='^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\\d+n\\d+p)\\d+$' \
  --collector.filesystem.ignored-mount-points='^/(sys|proc|dev|run)($|/)' \
  --collector.netdev.device-blacklist='^lo$' \
  --collector.textfile.directory=/var/lib/prometheus/node_exporter \
  --collector.netstat.fields='(.*)' \
  --collector.vmstat.fields='(.*)' \
  --collector.interrupts \
  --collector.processes \
  --collector.systemd \
  --collector.tcpstat
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
metrics:
root@barman-01 ~ # curl -Ss 10.10.90.1:9100/metrics|grep _md_
# HELP node_md_blocks Total number of blocks on device.
# TYPE node_md_blocks gauge
node_md_blocks{device="md0"} 1.046528e+06
node_md_blocks{device="md1"} 1.9530507264e+10
# HELP node_md_blocks_synced Number of blocks synced on device.
# TYPE node_md_blocks_synced gauge
node_md_blocks_synced{device="md0"} 1.046528e+06
node_md_blocks_synced{device="md1"} 1.9530507264e+10
# HELP node_md_disks Number of active/failed/spare disks of device.
# TYPE node_md_disks gauge
node_md_disks{device="md0",state="active"} 4
node_md_disks{device="md0",state="failed"} 0
node_md_disks{device="md0",state="spare"} 0
node_md_disks{device="md1",state="active"} 4
node_md_disks{device="md1",state="failed"} 0
node_md_disks{device="md1",state="spare"} 0
# HELP node_md_disks_required Total number of disks of device.
# TYPE node_md_disks_required gauge
node_md_disks_required{device="md0"} 4
node_md_disks_required{device="md1"} 4
# HELP node_md_state Indicates the state of md-device.
# TYPE node_md_state gauge
node_md_state{device="md0",state="active"} 1
node_md_state{device="md0",state="inactive"} 0
node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0
node_md_state{device="md1",state="active"} 1
node_md_state{device="md1",state="inactive"} 0
node_md_state{device="md1",state="recovering"} 0
node_md_state{device="md1",state="resync"} 0
mdstat:
root@barman-01 ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md1 : active raid6 sdb2[1] sdc2[2] sdd2[3] sda2[0]
      19530507264 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      [==============>......]  check = 73.4% (7173181200/9765253632) finish=273.0min speed=158203K/sec
      bitmap: 2/73 pages [8KB], 65536KB chunk
md0 : active raid1 sdb1[1] sdc1[2] sdd1[3] sda1[0]
      1046528 blocks super 1.2 [4/4] [UUUU]
unused devices: <none>
What did you expect to see?
I expected to see difference between node_md_blocks and node_md_blocks_synced values. Currently values are the same although /proc/mdstat shows syncing.
node_md_blocks{device="md1"} 1.9530507264e+10
node_md_blocks_synced{device="md1"} 1.9530507264e+10
I expected recovering and resync metrics set to 1:
node_md_state{device="md0",state="recovering"} 0
node_md_state{device="md0",state="resync"} 0
I took a look at the mdadm details and it seems that the array is in checking state. I think it would be a good idea to add this state to the metrics.
However when I check sysfs I can see following syncing information. In fully operational state this file says none. I think this value should reflect node_md_blocks_synced metrics:
root@barman-01 ~ # cat /sys/block/md1/md/sync_completed
15295250000 / 19530507264
mdadm details:
root@barman-01 /sys/block/md1/md # mdadm --detail /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Tue Apr 28 10:56:42 2020
        Raid Level : raid6
        Array Size : 19530507264 (18625.74 GiB 19999.24 GB)
     Used Dev Size : 9765253632 (9312.87 GiB 9999.62 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent
     Intent Bitmap : Internal
       Update Time : Wed Oct 21 13:43:29 2020
             State : active, checking
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0
            Layout : left-symmetric
        Chunk Size : 512K
Consistency Policy : bitmap
      Check Status : 78% complete
              Name : rescue:1
              UUID : d11ed962:b8848438:41411ae3:2e973bf6
            Events : 110843
    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2
There's a procfs PR to improve mdadm parsing. (https://github.com/prometheus/procfs/pull/329) But, as @dswarbrick mentioned, we should probably add parsing for the new sysfs files.
Thanks @dswarbrick for implementing sysfs files parsing! Is there a chance this change will be adopted in node_exporter for the foreseeable future?