EBS volumes fail to format on nitro instances
While trying to setup a production cluster using the data node instance type m5a.4xlarge I'm getting the following error from the couchbase-commons/mount-volume.sh script.
cat /opt/couchbase/var/lib/couchbase/logs/mock-user-data.log
Mounting EBS Volume for the data directory
2020-12-28 17:56:44 [INFO] [part-001] Creating ext4 file system on /dev/xvdh...
mke2fs 1.44.1 (24-Mar-2018)
The file /dev/xvdh does not exist and no size was specified.
There is no file in /dev for xvdh but it does indeed show that device name in the aws console as attached. When I run lsblk on one of the instances I only see the following:
NAME TYPE SIZE FSTYPE MOUNTPOINT LABEL
nvme1n1 disk 200G
nvme0n1 disk 50G
└─nvme0n1p1 part 50G ext4 / cloudimg-rootfs
I'm able to manually format the nvme device using the mount_volume function but the ASG fails to create instances when I change the data_volume_device_name to /dev/nvme1n1
Launching a new EC2 instance. Status Reason: Invalid device name /dev/nvme1n1. Launching EC2 instance failed.
[update] It appears that aws nitro based instances and EBS device naming don't symlink the xvdX names any longer and the script doesn't account for this. 🤨
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#identify-nvme-ebs-device
Indeed. Using nvme is quite a bit more complicated:
- You have to detect if the system is using nvme using
lsblk. - You then run
nvme listto get the list of nvme volumes (note: this requires thenvmeutility to be installed). - You then have to find the device you want using
nvme id-ctrl. - And then when mounting the volume, you have to mount it using its UUID, as that's the only thing consistent across reboots.
I don't think we're going to be able to get a fix in soon. Does anyone have some cycles to submit a PR for this in the meantime?
Keep in mind that your step 3 assumes you can reuse the terraform device name. This however throws an error when ASG tries to create the ebs volume. So your solution would have to create a second volume name and pass it forward thru the functions.
Keep in mind that your step 3 assumes you can reuse the terraform device name. This however throws an error when ASG tries to create the ebs volume.
What error?
The asg is unable to start the instances because the EBS volume fails creation. I don’t recall the words. Give it a try.
We use the approach described in https://github.com/gruntwork-io/terraform-aws-couchbase/issues/73#issuecomment-755360785 (which we have in a private script) with a number of ASGs, and it works OK, so I'm not sure how to repro...
What device name are you using? I tired nvme1 and nvme1n1 and the instances failed to start due to a failed ebs volume.
Well, hopefully the issue and this closed pr will help someone overcome the hurdles in making these scripts production ready.
As I wrote above, we use the UUID, which we look up using blkid.
This repo is being archived, feel free to use a fork if necessary.