Optionally delete AMIs when `AWS::ImageBuilder::Image` is deleted or replaced.
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
Add an option to the AWS::ImageBuilder::Image Cfn resource that enables automatic deletion of the underlying AMI (EBS volumes and snapshots included) when the Image Builder image is deleted or replaced.
Potentially something similar to the UpdateMethod on AWS::SSM::Document (docs) but with imagebuilder:StartResourceStateUpdate request parameters (docs).
Resources:
ImageBuilderLifecycleRole:
Type: AWS::IAM::Role
Properties:
# ...
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/EC2ImageBuilderLifecycleExecutionPolicy
MyImage:
Type: AWS::ImageBuilder::Image
Properties:
# ...
# imagebuilder:StartResourceStateUpdate request parameters.
Lifecycle:
ExecutionRole: !GetAtt ImageBuilderLifecycleRole.Arn
IncludeResources:
Amis: true
Snapshots: true
State:
Status: DELETED
The Cfn resource handler must wait for underlying resource deletion to complete before marking Cfn resource deletion as successful/failed (i.e. wait for imagebuilder:GetLifecycleExecution to return execution completion).
This ensures the lifecycle execution role isn't deleted before execution occurs.
It also requires lifecycle executions to take a reasonably short amount of time (seconds to <1-2 minutes) to prevent long stack updates.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
When an AWS::ImageBuilder::Image is deleted or replaced, the underlying AMI isn't deleted. This causes AMIs to accumulate indefinitely. AWS::ImageBuilder::LifecyclePolicy doesn't seem to work well for setups replacing the backing AWS::ImageBuilder::Recipe on each Cfn stack update.
We currently use Image Builder workflows to bake a new Linux AMI on each deployment. Our workflows run an SSM document shell script which contains the Git revision hash of the source repository with the system configuration. Here's the general format of the SSM document shell script:
# Switch from ssm-user to the default user.
sudo su ec2-user
# Install RPM packages.
sudo dnf install --assumeyes curl-minimal git
# Install Nix.
curl --fail --location https://install.determinate.systems/nix/tag/v3.1.1 --proto '=https' --show-error --silent --tlsv1.2 | sh -s -- install --no-confirm
. /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh
# Setup Nix flake registry.
nix registry add nixpkgs github:NixOS/nixpkgs/{Git revision hash}
nix registry add system-manager github:numtide/system-manager/{Git revision hash}
# Install Nix packages.
nix profile install system-manager
# Apply system-manager configuration (installs system-wide packages and sets up systemd units).
sudo $(command -v system-manager) pre-populate --flake 'git+{Git HTTPs URL}&rev={Git revision hash}#{system-manager flake output key}'
Each Cfn stack update:
- Creates a new Image Builder image recipe (
AWS::ImageBuilder::ImageRecipe) with a new version number.- Image Builder complains if a property requiring replacement changes (e.g. parent AMI) but the version number doesn't change (e.g.
resource with version 0.0.0/1 already exists).- We use
0.0.{Git commit Unix timestamp modulo 2^30}.
- We use
- Image Builder complains if a property requiring replacement changes (e.g. parent AMI) but the version number doesn't change (e.g.
- Creates a new SSM document (
AWS::SSM::Document). - Creates a new Image Builder workflow (
AWS::ImageBuilder::Workflow) with a new version number.- Image Builder complains if a property requirement replacement changes (e.g. data) the version number doesn't change (e.g.
resource with version 0.0.0/1 already exists).- We use
0.0.{Git commit Unix timestamp modulo 2^30}.
- We use
- Image Builder complains if a property requirement replacement changes (e.g. data) the version number doesn't change (e.g.
- Creates a new Image Builder image (
AWS::ImageBuilder::Image).
The image selection criteria for AWS::ImageBuilder::LifecyclePolicy requires recipe selectors to include both the name AND version. Since the image recipe version changes on each deployment, the lifecycle rule selection criteria can't use the image recipe selector.
Are you currently working around this issue?
Setting the DeletionPolicy (docs) and UpdateReplacePolicy (docs) on the AWS::ImageBuilder::Image resource to Retain so it can be picked up by Image Builder lifecycle policies.
The lifecycle policy for each image lineage:
- Includes images with a specific image lineage tag.
- Excludes images with a Git revision tag pointing to the latest revision.
- Excludes AMIs launched within a past time period.
Image lineages use an age-based deletion policy. Once all images are deleted for a deprecated image lineage (upwards of a day for a lifecycle execution to occur), the deletion policy for the image lineage can be removed.
This requires using fixed image recipe names, image tags, and lifecycle policy image tag selectors to identify images from the same lineage (i.e. from recipes with the same name and any version).
The workaround requires launching the image at least once within the past time period. This can be problematic for setups where auto scaling groups have a fixed size since instance launches may be an infrequent event after the deployment (mostly caused by outages and hardware maintenance).
If an active AMI isn't launched frequently, this workaround isn't able to handle deployment rollbacks correctly. Suppose we have this deployment history:
- ✅ Deployment 1
- ❌ Deployment 2
When doing deployment 2, a new lifecycle rule will be created which only excludes the image from deployment 2. If it runs during the deployment, it can accidentally delete the image from deployment 1 and break rollback.
Additional context
The image deletion functionality in the AWS console switches between imagebuilder:DeleteImage (docs) and imagebuilder:StartResourceStateUpdate (docs) depending on if users select the options to delete the underlying AMI, EBS, and ECR resources.
This is because the former doesn't cross service boundaries (strictly deletes Image Builder resources) while the latter does. As a result, the latter asks for an IAM role to assume for cleanup.