ec2-snapper
ec2-snapper copied to clipboard
Deletion options don't work well with long-running instances?
Imagine you have an instance with Name
tag foo
running in an ASG, and you're using ec2-snapper
to take a snapshot of it once per day. Each time that instance crashes or is redeployed, the old instance is terminated, and a new one, with the same Name
tag, is brought in to replace it.
If you try to clean up old snapshots for this instance with ec2-snapper delete --ami-name=foo
, you'll end up with one of the following scenarios:
- Simple case: The ASG had just terminated and replaced the instance, so you end up with two instances (one terminated, one running) with the same
Name
tag for a short while. That will lead to an error. - More complicated case: Let's say you had instance ID
i-12345
running in the ASG for 2 weeks. You took a snapshot each day, so you now have 14 snapshots. Thedelete
command is configured to delete snapshots older than 15 days, so nothing has been deleted yet. On day 15, instancei-12345
crashes and the ASG automatically replaces it with a new instance with IDi-67890
. A few hours later, thedelete
command runs, and the first step is to look up the instance ID by name. It looks upfoo
and finds the new IDi-67890
for it. It now tries to clean up all snapshots fori-67890
, but of course, that's a new ID, so it finds no snapshots for it, and, unfortunately, does not clean up any of the snapshots for the old IDi-12345
. Over time, this sort of bug results in lots of snapshots being left behind for older instance IDs.
Perhaps we should tag the actual snapshots with the instance name and look up snapshots by name directly, instead of going through the instance ID intermediary?
Perhaps we should tag the actual snapshots with the instance name and look up snapshots by name directly, instead of going through the instance ID intermediary?
Yep, I think that's the right approach. This humble tool was written years ago for the use case of backing up a single, presumably persistent server. It was never really designed for the ASG use case, and I think the approach you propose sounds solid. PRs welcome!