ebs-automatic-nvme-mapping icon indicating copy to clipboard operation
ebs-automatic-nvme-mapping copied to clipboard

script and udev rule for deterministic ephemeral devices

Open missingcharacter opened this issue 4 years ago • 2 comments

This was tested on systemd-udev

https://www.freedesktop.org/software/systemd/man/udev.html

Worked on this with @thecubed

missingcharacter avatar Apr 22 '20 03:04 missingcharacter

I'm not intimately familiar with the sequence of operations, but isn't there a potential for a conflict of two instances of the next ephemeral script are executed simultaneously?

i.e. two EBS volumes are attached, udevd executes next ephemeral, and there's two instances trying to grab /dev/ephemeral1...

You could use a lock file to protect against this but then you end up in a scenario where one invocation can run and succeed, but the other one fails. How would udevd handle that failure (or if not handled, how is that failure communicated)?

oogali avatar Sep 03 '20 13:09 oogali

Hi @oogali, thanks for getting back to me.

but isn't there a potential for a conflict of two instances of the next ephemeral script are executed simultaneously?

Yes, you are right this could happen. I think the code below could solve this concern:

#!/usr/bin/env bash
# To be used with the udev rule: /etc/udev/rules.d/999-aws-ebs-nvme.rules

# check if lock file exists
script_name="$(basename $0)"
pid_file="/tmp/${script_name}.lock"
counter=0
until [ $counter -eq 5 ] || [[ ! -e "${pid_file}" ]] ; do
  sleep $(( counter++ ))
done

# create lock file if it does not exist
if [[ -e "${pid_file}" ]]; then
  echo "Lock file ${pid_file} still exists after counter ended" >&2 
  exit 1
else
  touch "${pid_file}"
fi

kern_name=${1}
incr=0
while [[ -e "/dev/ephemeral${incr}" ]] && [[ $(readlink "/dev/ephemeral${incr}") != "${kern_name}" ]]; do
  incr=$[$i+1]
done
# remove lock file
rm "${pid_file}"
echo "ephemeral${incr}"

How would udevd handle that failure (or if not handled, how is that failure communicated)?

The failure will be communicated with echo "Lock file ${pid_file} still exists after counter ended" >&2 and exit 1 and it should look like this in systemd logs

$ journalctl -u systemd-udevd
Sep 3 21:57:43 test-vm systemd-udevd[2998]: failed to execute '/usr/local/bin/nextephemeraldevice.sh nvme1n1': Lock file /tmp/nextephemeraldevice.sh.lock still exists after counter ended

Also, the default timeout is 180 seconds according to freedesktop

One other thing next ephemeral should only run for EC2 Instance Store

missingcharacter avatar Sep 03 '20 22:09 missingcharacter