fs_setup/disk_setup: option to wait for the device to exist before continuing
This bug was originally filed in Launchpad as LP: #1832645
Launchpad details
affected_projects = [] assignee = None assignee_name = None date_closed = None date_created = 2019-06-12T20:48:53.539989+00:00 date_fix_committed = None date_fix_released = None id = 1832645 importance = medium is_complete = False lp_url = https://bugs.launchpad.net/cloud-init/+bug/1832645 milestone = None owner = minfrin-y owner_name = Graham Leggett private = False status = in_progress submitter = minfrin-y submitter_name = Graham Leggett tags = [] duplicates = [1907080]
Launchpad user Graham Leggett(minfrin-y) wrote on 2019-06-12T20:48:53.539989+00:00
When using the AWS::EC2::Volume and AWS::EC2::VolumeAttachment options to add a volume to an AWS::EC2::Instance on AWS EC2, the volume is not immediately available on the instance.
This causes fs_setup and disk_setup to fail.
What would prevent this failure is a "wait" option on both fs_setup and disk_setup, which if true, will cause cloud-init to wait until the device exists (caused by AWS catching up and attaching the device) before continuing.
Launchpad user Ryan Harper(raharper) wrote on 2019-07-18T20:48:32.739674+00:00
Hi,
Thanks for filing the bug. Could you describe the launch process in a bit more detail? Specifically, have the API calls to attach the volume run before the instance is booted? Is it that the volumes arrive after the instance has started booting? Are you providing your own cloud-config with fs_setup/disk_setup cloud-config? Or do these volumes show up in the EC2 metadata (block-device-mapping)?
If possible, can you run 'cloud-init collect-logs' as root and attach the tarball output on a failing instance?
Thanks!
Launchpad user Graham Leggett(minfrin-y) wrote on 2019-07-18T23:56:16.957762+00:00
We don't have a failing instance in this case, as the last time we tried this was a few years ago. The workaround was to not use the AWS::EC2::VolumeAttachment at all, but rather to create an EBS volume as part of the AWS::EC2::Instance. This has other side effects we want to avoid, thus this bug.
Look carefully at the definition of a AWS::EC2::VolumeAttachment:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-ebs-volumeattachment.html
A volume attachment depends on both a volume, and an instance. Both the volume and the instance have to exist before the volume attachment can exist. By definition that means that the instance is started up before the volume is attached to the instance.
If we instruct cloud-init to prepare the volume with a filesystem on it (and we want to) this will fail, because the attempt to prepare the volume happens before the volume has attached. Cloud-init fails, orchestration fails, and all is lost.
All we're looking for is the option for cloud-init to say "I have been asked to set up this disk. This disk does not yet exist. Instead of throwing a fatal error and failing, I will wait until this disk does exist, and then I will continue to do what I need to do after that as normal as if nothing had happened".
Launchpad user Launchpad Janitor(janitor) wrote on 2019-09-17T04:17:45.972557+00:00
[Expired for cloud-init because there has been no activity for 60 days.]
Launchpad user Maurizio(mauri-maurizio) wrote on 2020-03-04T11:33:02.065435+00:00
Hi all, we had a very similar issue here. We didn't catch if there was a proposed solution.
The workaround (not use the AWS::EC2::VolumeAttachment at all, but rather to create an EBS volume as part of the AWS::EC2::Instance) is not ideal because we would like to have that flexibility. As a possible solution, we were waiting for disk mount completion with a conditional look in the bootcmd but it seems asynchronous.
Do you have any suggestions?
Launchpad user James Thompson(james-thompson) wrote on 2020-09-02T20:49:32.988423+00:00
This would be useful to me as well.
Launchpad user Henry Ford(hrford) wrote on 2020-10-30T21:30:06.292679+00:00
/bin/cloud-init 19.3-3.amzn2
I have this same issue and now cannot use cloud-init's mounts module.
Occasionally cloud-init would not find the (late) attachment and then add "None" to the device column of fstab. This means, even if the instance is rebooted, the mount would not be retried by the system.
In cloudformation; I'm not using an explicit volume attachment resource, but an implicit one.
Second to this, fs_setup also suffers from a similar issue and formats the volume if it's not attached at first and then attached shortly after, (Maybe a different bug).
For reference, and to help others, my work-around is to apply the logic in low-level bash, (which I wanted to avoid):
runcmd:
- "[ ! -b /dev/xvdf ] && (echo "ERROR: xvdf not attached. Will sleep 30s..."; sleep 30;)"
volume is formatted to ext4? OK, else format
- blkid -o full /dev/xvdf | grep "ext4" && echo "xvdf is ext4" || mkfs -t ext4 /dev/xvdf -L label
no mountpoint? make it
- "[ ! -d /mnt/ebs/ ] && mkdir /mnt/ebs"
volume is in fstab? if not: add it
- 'grep -q "/dev/xvdf" /etc/fstab && echo "xvdf already in fstab" || echo "/dev/xvdf /mnt/ebs ext4 defaults,nofail 0 2" >> /etc/fstab'
mount volumes added above
- mount -a
My work-around still provides hope even if the first boot missed the attachment as fstab contains the correct info. Although I wouldn't have solved an unformatted volume because second boot doesn't use runcmd.
Launchpad user Dan Watkins(oddbloke) wrote on 2020-11-02T19:23:39.842761+00:00
Adding a way of configuring a wait at boot seems reasonable. Are any of the various people who've experienced this interested in contributing such a change?
Launchpad user Ryan Harper(raharper) wrote on 2020-12-07T16:53:03.047544+00:00
A fix is being worked on here: https://github.com/canonical/cloud-init/pull/710
I have the same problem of wanting to use the disk_setup module for formatting an EBS volume, but the EBS volume may not be available right at boot.
My workaround is to use a bootcmd that pauses cloud-init until the given device is available:
bootcmd:
# https://github.com/canonical/cloud-init/issues/3386
- |
filename_to_wait_for="/dev/nvme1n1"
# Timeout in seconds
timeout=600 # 10 minutes
# Check every `interval` seconds
interval=5
elapsed=0
while [ ! -e "$filename_to_wait_for" ]; do
sleep "$interval"
elapsed=$((elapsed + interval))
if [ "$elapsed" -ge "$timeout" ]; then
echo "Timeout reached. File not found: $filename_to_wait_for"
exit 1
fi
done
echo "File found: $filename_to_wait_for"
By default, the bootcmd module runs before the disk_setup module, so this works make sure the EBS volume in /dev/nvme1n1 exists before cloud-init tries to run the disk_setup module.
https://github.com/canonical/cloud-init/pull/4673 should fix this, PTAL.