amazon-eks-ami
amazon-eks-ami copied to clipboard
Add support for raid10 on instance storage
Description of changes: I would like to be able to migrate workloads away from a node gracefully in case of instance storage drive failure. Raid10 would provide redundancy and trade off disk space.
Adding support for creating raid10 in addition to raid0. This also removes the wait block for raid resync for two reasons:
- raid0 does not have redundancy and therefore no initial resync[1]
- with raid10 the resync time for 4x 1.9TB disks takes from tens of minutes to multiple hours, depending on sysctl params
dev.raid.speed_limit_min
anddev.raid.speed_limit_max
and the speed of the disks. Initial resync for raid10 is not strictly needed[1]
filesystem creation: by default mkfs.xfs
attempts to TRIM the drive. This is also something that can take tens of minutes or hours, depening on the size of drives. TRIM can be skipped, as instances are delivered with disks fully trimmed[2].
[1] https://raid.wiki.kernel.org/index.php/Initial_Array_Creation [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#InstanceStoreTrimSupport
Testing Done
on m6id.metal
with kernel defaults:
# uname -a
Linux ip-10-24-0-65.eu-west-1.compute.internal 5.10.199-190.747.amzn2.x86_64 #1 SMP Sat Nov 4 16:55:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
# sysctl dev.raid.speed_limit_min
dev.raid.speed_limit_min = 1000
# sysctl dev.raid.speed_limit_max
dev.raid.speed_limit_max = 200000
# mdadm --create --force --verbose /dev/md/kubernetes --level=10 --name=kubernetes --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855337472K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/kubernetes started.
# cat /proc/mdstat
Personalities : [raid10]
md127 : active raid10 nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
3710674944 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
[>....................] resync = 1.1% (41396352/3710674944) finish=304.3min speed=200910K/sec
bitmap: 28/28 pages [112KB], 65536KB chunk
With increased resync limits:
# sysctl -w dev.raid.speed_limit_min=2146999999 ; sysctl -w dev.raid.speed_limit_max=2146999999
dev.raid.speed_limit_min = 2146999999
dev.raid.speed_limit_max = 2146999999
# cat /proc/mdstat
Personalities : [raid10]
md127 : active raid10 nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
3710674944 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
[===>.................] resync = 19.9% (740172096/3710674944) finish=20.4min speed=2418848K/sec
bitmap: 23/28 pages [92KB], 65536KB chunk