gluster-one Better cleanup of LVM metadata

Do not use small blocks. 1M writes should be far more efficient.
Use directIO - no need to cache this and direct IO does not need sync.

Note: wasn't tested. I don't have a system to test on (need to set up a virtual env. perhaps!)

Jul 03 '18 08:07 mykaul

Hmm, need to understand this 'seek' thing. I'm not sure the 1M will be aligned if there's an offset being used!

Jul 03 '18 08:07 mykaul

I can see how this might make the cleanup quicker.

This bit has been an interesting challenge. The trouble is that if you re-deploy to a system that you have previously deployed to, and you have not done a proper lvremove/vgremove/pvremove cleanup, there will remain residual LVM metadata at the partition offset of the cache device. When the deployment then creates new partitions at that same offset, udev sees the residual metadata there and attempts to import it, resulting in a corrupted LVM configuration that causes play failures when the LVM plays attempt to establish the block stack.

The seek here is designed to find that partition offset before we create the partition, and then the dd just forcefully overwrites a chunk of the disk to get rid of the LVM metadata.

Jul 09 '18 17:07 dustinblack

Surely by 4MB you've already overwritten everything? And we could do more than 4MB easily if we use dd properly. (But perhaps there's something at the end as well? There should be some metadata at the 'far end' no?)

Jul 09 '18 18:07 mykaul

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/logical_volume_manager_administration/lvm_metadata

By default, the pvcreate command places the physical volume label in the 2nd 512-byte sector. This label can optionally be placed in any of the first four sectors, since the LVM tools that scan for a physical volume label check the first 4 sectors. The physical volume label begins with the string LABELONE.

I think this is the metadata that is problematic for us. Since it is created relative to the location of the beginning of the PV, if that PV is a partition then the metadata is located at the partition offset.

According to this verbiage, we should need to be concerned about the first four sectors, or effectively 2KB at the start of any partition. So I think:

shell: /bin/bash -c "/usr/bin/dd if=/dev/zero of={{ item[0] }} bs=512 count=4 seek={{ ( 512|int * (arbiter_end|default(50)|int + ((item[1]['id']|int * cache_part_size|int))) + 1|int) }} oflag=seek_bytes,direct"

would do the job just fine. But if bs=2K count=1 or even bs=1M count=1 is more efficient from an I/O standpoint, that should be fine, too.

Jul 09 '18 19:07 dustinblack