ebs-automatic-nvme-mapping
ebs-automatic-nvme-mapping copied to clipboard
script and udev rule for deterministic ephemeral devices
This was tested on systemd-udev
https://www.freedesktop.org/software/systemd/man/udev.html
Worked on this with @thecubed
I'm not intimately familiar with the sequence of operations, but isn't there a potential for a conflict of two instances of the next ephemeral script are executed simultaneously?
i.e. two EBS volumes are attached, udevd executes next ephemeral, and there's two instances trying to grab /dev/ephemeral1
...
You could use a lock file to protect against this but then you end up in a scenario where one invocation can run and succeed, but the other one fails. How would udevd handle that failure (or if not handled, how is that failure communicated)?
Hi @oogali, thanks for getting back to me.
but isn't there a potential for a conflict of two instances of the next ephemeral script are executed simultaneously?
Yes, you are right this could happen. I think the code below could solve this concern:
#!/usr/bin/env bash
# To be used with the udev rule: /etc/udev/rules.d/999-aws-ebs-nvme.rules
# check if lock file exists
script_name="$(basename $0)"
pid_file="/tmp/${script_name}.lock"
counter=0
until [ $counter -eq 5 ] || [[ ! -e "${pid_file}" ]] ; do
sleep $(( counter++ ))
done
# create lock file if it does not exist
if [[ -e "${pid_file}" ]]; then
echo "Lock file ${pid_file} still exists after counter ended" >&2
exit 1
else
touch "${pid_file}"
fi
kern_name=${1}
incr=0
while [[ -e "/dev/ephemeral${incr}" ]] && [[ $(readlink "/dev/ephemeral${incr}") != "${kern_name}" ]]; do
incr=$[$i+1]
done
# remove lock file
rm "${pid_file}"
echo "ephemeral${incr}"
How would udevd handle that failure (or if not handled, how is that failure communicated)?
The failure will be communicated with echo "Lock file ${pid_file} still exists after counter ended" >&2
and exit 1
and it should look like this in systemd logs
$ journalctl -u systemd-udevd
Sep 3 21:57:43 test-vm systemd-udevd[2998]: failed to execute '/usr/local/bin/nextephemeraldevice.sh nvme1n1': Lock file /tmp/nextephemeraldevice.sh.lock still exists after counter ended
Also, the default timeout is 180 seconds according to freedesktop
One other thing next ephemeral should only run for EC2 Instance Store