jc icon indicating copy to clipboard operation
jc copied to clipboard

Add parser for mdadm

Open g-v-egidy opened this issue 2 years ago • 15 comments

Linux software RAID is usually implemented with the md kernel module and managed with the mdadm userspace tool. Unfortunately mdadm doesn't output it's data in a format convenient for further automated processing like json.

For example when using mdadm --query --detail /dev/md0 the output looks like this:

/dev/md0:
           Version : 1.1
     Creation Time : Tue Apr 13 23:22:16 2010
        Raid Level : raid1
        Array Size : 5860520828 (5.46 TiB 6.00 TB)
     Used Dev Size : 5860520828 (5.46 TiB 6.00 TB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Jul 26 20:16:31 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : virttest:0
              UUID : 85c5b164:d58a5ada:14f5fe07:d642e843
            Events : 2193679

    Number   Major   Minor   RaidDevice State
       3       8       17        0      active sync   /dev/sdb1
       2       8       33        1      active sync   /dev/sdc1

mdadm also has the "examine" command which gives information about a md raid member device:

# mdadm --examine -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x1
     Array UUID : 85c5b164:d58a5ada:14f5fe07:d642e843
           Name : virttest:0
  Creation Time : Tue Apr 13 23:22:16 2010
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 11721041656 sectors (5.46 TiB 6.00 TB)
     Array Size : 5860520828 KiB (5.46 TiB 6.00 TB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
   Unused Space : before=80 sectors, after=0 sectors
          State : clean
    Device UUID : 813162e5:2e865efe:02ba5570:7003165c

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 26 20:16:31 2022
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : f141a577 - correct
         Events : 2193679


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

Having a parser for this in jc would be really helpful when dealing with md raid in scripts.

g-v-egidy avatar Jul 26 '22 19:07 g-v-egidy

Thank you for the parser suggestion! Yeah, it probably makes sense to have a dedicated parser for this. In the mean time you could try the Key/Value parser. (jc --kv) It won't be too fancy, but it might get the job done in the short-term.

Edit: The kv parser will hiccup on the indented fields because it looks for indentation to consolidate lines. (like a long text string). You'd have to first remove all initial spaces before running through jc for a better result.

kellyjonbrazil avatar Jul 26 '22 19:07 kellyjonbrazil

Thanks for the quick reaction and suggesting jc --kv.

Unfortunately the most important information is often the tabular section at the end:

    Number   Major   Minor   RaidDevice State
       3       8       17        0      active sync   /dev/sdb1
       2       8       33        1      active sync   /dev/sdc1

Because this tells you which devices are part of the md array and which state they are in. I don't think kv will help much with this...

g-v-egidy avatar Jul 26 '22 20:07 g-v-egidy

Could you take a look at this library to see if the output works for you? If so, I can vendor this library into jc instead of writing a parser from scratch.

https://github.com/truveris/py-mdstat

Ah, nevermind - that's mdstat not mdadm. :)

kellyjonbrazil avatar Aug 01 '22 21:08 kellyjonbrazil

Ah, nevermind - that's mdstat not mdadm. :)

Yeah. I have seen this library before opening this issue.

It is parsing /proc/mdstat. mdstat gives some of the necessary data, but not all. For example the UUIDs are missing. It also just gives information about currently active raid devices, so it does not cover the --examine mode that can query devices that are offline.

When you seriously consider to develop a mdadm parser, I could lend some help creating example output of different raid scenarios and states.

g-v-egidy avatar Aug 02 '22 08:08 g-v-egidy

If you could add more samples that would be great. I will probably release a new version of jc in the next couple weeks and I'd like to get this one in there. The samples are the hardest part to get sometimes so that would be very helpful. Thanks!

kellyjonbrazil avatar Aug 02 '22 12:08 kellyjonbrazil

This sound good. I will create a variety of sample output files in different states and raid levels in the next couple of days and upload them here.

g-v-egidy avatar Aug 02 '22 17:08 g-v-egidy

Doing some more research on this command. I don't have a raid setup so it's a little difficult for me to check out all of the output possibilities. I noticed this in the man page:

-Y, --export
             When used with --detail, --detail-platform, --examine, or
             --incremental output will be formatted as key=value pairs
             for easy import into the environment.

             With --incremental The value MD_STARTED indicates whether
             an array was started (yes) or not, which may include a
             reason (unsafe, nothing, no).  Also the value MD_FOREIGN
             indicates if the array is expected on this host (no), or
             seems to be from elsewhere (yes).

Can you show me what the --export output looks like? I'm thinking it could be used with the K/V or env parser. I'm not sure if it still includes the table output, though.

If there is still valuable data that is not exported in a machine-friendly format, it doesn't look like it would be too difficult to write this parser. I just need more samples so I can figure out the schema. For example, most commands jc will output an array of objects. It looks like this command will only output a single item, so maybe single object output will be better. I don't have a system to test and see if there are scenarios where multiple devices can be examined/queried at the same time with this command.

kellyjonbrazil avatar Aug 07 '22 18:08 kellyjonbrazil

I was just getting started collecting example output for you, beginning with raid0.

Unfortunately the --export output is quite a letdown, it is missing most of the information.

Here an example of the same array:

# mdadm -Q --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Sun Aug  7 20:56:52 2022
        Raid Level : raid0
        Array Size : 200704 (196.00 MiB 205.52 MB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Sun Aug  7 20:56:52 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

            Layout : -unknown-
        Chunk Size : 512K

Consistency Policy : none

              Name : sysrescue:0  (local to host sysrescue)
              UUID : 7e81a856:abb9c1c2:4b71237a:9778cc66
            Events : 0

    Number   Major   Minor   RaidDevice State
       0     254        1        0      active sync   /dev/vda1
       1     254        2        1      active sync   /dev/vda2

and this is what export gives you:

# mdadm -Q --detail --export /dev/md0 
MD_LEVEL=raid0
MD_DEVICES=2
MD_METADATA=1.2
MD_UUID=7e81a856:abb9c1c2:4b71237a:9778cc66
MD_NAME=sysrescue:0
MD_DEVICE_dev_vda2_ROLE=1
MD_DEVICE_dev_vda2_DEV=/dev/vda2
MD_DEVICE_dev_vda1_ROLE=0
MD_DEVICE_dev_vda1_DEV=/dev/vda1

So I think we can forget using the --export mode.

g-v-egidy avatar Aug 07 '22 19:08 g-v-egidy

No problem, here is what I have so far in dev:

$ cat mdadm-query-detail.out | jc --mdadm -p
{
  "device": "/dev/md0",
  "version": "1.1",
  "creation_time": "Tue Apr 13 23:22:16 2010",
  "raid_level": "raid1",
  "array_size": "5860520828 (5.46 TiB 6.00 TB)",
  "used_dev_size": "5860520828 (5.46 TiB 6.00 TB)",
  "raid_devices": 2,
  "total_devices": 2,
  "persistence": "Superblock is persistent",
  "intent_bitmap": "Internal",
  "update_time": "Tue Jul 26 20:16:31 2022",
  "state": "clean",
  "active_devices": 2,
  "working_devices": 2,
  "failed_devices": 0,
  "spare_devices": 0,
  "consistency_policy": "bitmap",
  "name": "virttest:0",
  "uuid": "85c5b164:d58a5ada:14f5fe07:d642e843",
  "events": 2193679,
  "device_table": [
    {
      "number": 3,
      "major": 8,
      "minor": 17,
      "state": [
        "active",
        "sync"
      ],
      "device": "/dev/sdb1",
      "raid_device": 0
    },
    {
      "number": 2,
      "major": 8,
      "minor": 33,
      "state": [
        "active",
        "sync"
      ],
      "device": "/dev/sdc1",
      "raid_device": 1
    }
  ],
  "array_size_num": 5860520828,
  "used_dev_size_num": 5860520828
}

$ cat mdadm-examine.out| jc --mdadm -p
{
  "device": "/dev/sdb1",
  "magic": "a92b4efc",
  "version": "1.1",
  "feature_map": "0x1",
  "array_uuid": "85c5b164:d58a5ada:14f5fe07:d642e843",
  "name": "virttest:0",
  "creation_time": "Tue Apr 13 23:22:16 2010",
  "raid_level": "raid1",
  "raid_devices": 2,
  "avail_dev_size": "11721041656 sectors (5.46 TiB 6.00 TB)",
  "array_size": "5860520828 KiB (5.46 TiB 6.00 TB)",
  "data_offset": 264,
  "super_offset": 0,
  "unused_space": "before=80 sectors, after=0 sectors",
  "state": "clean",
  "device_uuid": "813162e5:2e865efe:02ba5570:7003165c",
  "internal_bitmap": "8 sectors from superblock",
  "update_time": "Tue Jul 26 20:16:31 2022",
  "bad_block_log": "512 entries available at offset 72 sectors",
  "checksum": "f141a577 - correct",
  "events": 2193679,
  "device_role": "Active device 0",
  "array_state": "AA ('A' == active, '.' == missing, 'R' == replacing)",
  "array_size_num": 5860520828,
  "avail_dev_size_num": 11721041656,
  "unused_space_before": 80,
  "unused_space_after": 0,
  "array_state_list": [
    "active",
    "active"
  ]
}

These are from the two original samples. Let me know if that looks workable for you.

Thanks!

kellyjonbrazil avatar Aug 07 '22 19:08 kellyjonbrazil

Here are many different states for raid0 and raid1: raid0-raid1.zip

There are some features of md raid I'm not familiar with, for example containers and clusters. Also there are the different raid levels like 4, 5, 6 still missing. So still a lot to do.

But I think with the attached test cases you can make some progress.

g-v-egidy avatar Aug 07 '22 20:08 g-v-egidy

"state": "clean",

as you can see in my examples, there are several states that can be active at once: State : clean, degraded, recovering. So this will probably have to be a list.

When looking at your device table:

"device_table": [
    {
      "number": 3,
      [...]
    },
    {
      "number": 2,
      [...]
    }
  ],

It is a list of dicts. Isn't that quite a complicated structure? To me it looks like the devices always have a "number", but having a "raid_device" is optional and depends on if a device is in use or spare. So wouldn't it be better to use a dict with the "number" as index?

When looking at a device in the table:

{
      "number": 3,
      "major": 8,
      "minor": 17,
      "state": [
        "active",
        "sync"
      ],
      "device": "/dev/sdb1",
      "raid_device": 0
    },

Unfortunately mdadm mixes states and flags in it's output. See my examples with failfast and write-mostly, those are flags and not states. I'm not sure if the states and flags can be properly divided from the output though.

g-v-egidy avatar Aug 07 '22 20:08 g-v-egidy

I found some more examples in this bugzilla report where users are complaining that the output format is inconsistent: https://bugzilla.redhat.com/show_bug.cgi?id=1380034

g-v-egidy avatar Aug 07 '22 20:08 g-v-egidy

I don't have a raid setup so it's a little difficult for me to check out all of the output possibilities.

Just an idea if you want to try it for yourself: use a virtual machine and add a second, empty virtual disk to it. Then you can use that to play around with the different setups. This is what I have been doing to create the output examples.

I've been using https://www.system-rescue.org/ as linux in the virtual machine, because you don't have to install anything, just add the iso image and it boots.

g-v-egidy avatar Aug 07 '22 20:08 g-v-egidy

Thanks! Working through these samples. I have an initial version in dev you can play with. I'll take a look at some of your feedback as well.

https://github.com/kellyjonbrazil/jc/blob/dev/jc/parsers/mdadm.py (you can put that in your plugin directory to test)

kellyjonbrazil avatar Aug 07 '22 20:08 kellyjonbrazil

Ok, I think I have addressed the outstanding issues. Could you test with the dev parser linked above?

as you can see in my examples, there are several states that can be active at once: State : clean, degraded, recovering. So this will probably have to be a list.

Yep, added a state_list field.

To me it looks like the devices always have a "number", but having a "raid_device" is optional and depends on if a device is in use or spare. So wouldn't it be better to use a dict with the "number" as index?

I don't think so as I noticed sometimes even number is null. (e.g. query-raid1-faulty)

Unfortunately mdadm mixes states and flags in it's output. See my examples with failfast and write-mostly, those are flags and not states. I'm not sure if the states and flags can be properly divided from the output though.

Yes, there are a few inconsistencies with the output. I think it makes the most sense to just append the flags to the device state in the table (as I have) since it seems impossible to tell which one is a state vs. flag.

Let me know if you run into any issues. Thanks!

kellyjonbrazil avatar Aug 07 '22 22:08 kellyjonbrazil

Sorry for taking so long to respond, I've been busy with other stuff.

Thank you very much for your work, I think this is mostly complete.

Here are some small things I found:

I think there is one issue with the "Chunk Size", available in RAID 0 and RAID 5. The size reported is 512K, but the json contains 512.

The "Events" field in query mode for 0.9 metadata contains values like "0.13", which is converted to 0 in the json. I think the events were encoded into major.minor event counts in the 0.9 metadata format. I suggest to detect these and output them as floats, while keeping integers for everything else.

It would be nice to get the hostname the array was created for (homehost). For 0.9 metadata this is reported behind the uuid field, for newer metadata it is reported behinde the "name" field. Sometimes, when using the --homehost option, the hostname is not shown at all. I love thier consistency... Could you try to extract the hostname from either of the fields and output it as a separate field when it is available?

I have done some more testing and created some containers and RAID 5 arrays: raid5-container.zip The containers show some more inconsistent output, especially in the --examine mode. But I don't think containers play a important role in practice, most users will use the regular formats. So I would not invest too much time into getting everything out of the output.

g-v-egidy avatar Aug 16 '22 15:08 g-v-egidy

Thanks for the feedback! I'm working through these, but just a quick update on a couple things:

I think there is one issue with the "Chunk Size", available in RAID 0 and RAID 5. The size reported is 512K, but the json contains 512.

No problem - fixing this by keeping the original field a string and adding a chunk_size_num field that only contains the integer.

The "Events" field in query mode for 0.9 metadata contains values like "0.13", which is converted to 0 in the json. I think the events were encoded into major.minor event counts in the 0.9 metadata format. I suggest to detect these and output them as floats, while keeping integers for everything else.

I think I will need to parse this out a bit more, then. The problem with converting this to a float is that numbers like 0.10 will turn into 0.1. I'll probably detect if there is a dot and then separate out the integers into their own fields or something like that.

kellyjonbrazil avatar Aug 16 '22 18:08 kellyjonbrazil

Ok - the latest version in dev is ready to test. I was able to incorporate many of your ideas. I was able to pull out a few more fields and fix some of the tables in the new examples. I didn't bother converting fields for indexed key-names, like unit[0]. If you find any important ones let me know and I can build a quick function to grab some of those. Thanks for testing!

kellyjonbrazil avatar Aug 16 '22 21:08 kellyjonbrazil

I tested your latest version. It is looking very good to me now, it now gets all values I care for and some more.

Thank you for implementing this parser.

g-v-egidy avatar Aug 17 '22 13:08 g-v-egidy

Released in v1.21.0 available via pip. Binaries coming soon.

kellyjonbrazil avatar Aug 21 '22 21:08 kellyjonbrazil