modernisation-platform
modernisation-platform copied to clipboard
Implement automated snapshot lifecycles
User Story
As a Modernisation Platform Engineer I want to implement the automated expiration of EBS snapshots So that we strike a balance between maintaining snapshots and keeping down costs
User Type(s)
Modernisation Platform Customer
Value
While we create and maintain snapshots on behalf of customers with AWS Backup, there are other paths for customers to create backups, such as the CreateImage API tool which will create backups with a description like so:
Created by CreateImage(i-1111111111111111) for ami-222222222222
We could provide Modernisation Platform customers the ability to delete snapshots manually, but rather than place the burden on them we think that we should automatically expire snapshots in line with our existing durations for AWS Backup.
Assumptions / Hypothesis / Questions / Unknowns
Definition of done
- [ ] data lifecycle manager implemented across accounts
- [ ] old snapshots expired in line with policy
- [ ] modernisation platform user guide updated
Reference
How to write good user stories Amazon Data Lifecycle Manager
later tickets to be created to get people to use it and to clear up any existing old backups
Steps required.
We need to create a script that does the deleteing.
We need to put the script on git, like something here -
We need to then use the lambda module to create a lambda and use that script.
Also need to make sure we create a user in each account with the correct permissions to run the lambda.
https://github.com/ministryofjustice/modernisation-platform/pull/4599
Need to still create a script to do it. I created one, however it didn't remove AWS Backup related snapshots!
I was also looking at this and put together the Go lambda code which I sent to the team on slack. Thank you to chatgpt for helping produce that.
You can find the code here: https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/blob/main/modules/backup/main.tf
Changed this to delete backups over 30 days to 30 days. PR https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/pull/257 raised
Completed
Now complete. Final PR https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/pull/259
Not complete yet, still need a new release and to test and update references to it
Had some issues so all of the above has been backed out. It was an issue that appeared to relate to production.
Changes tested and they work. For example we removed the backup plans (to test) and for Sprinkler it was run and amended the plan name (no longer 120 days) and this built successfully. I will make the changes in production on Monday 4th when I can change the baselines, create a new release and put this into modernisation platform repo.
To confirm a re-apply has been completed on both of the above and it returns no changes and confirms that there are no changes to take place.
Release added for this now (https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/pull/271) and I will also add another for modernisation-platform after making a new release for the above.
About to test against portal development environment
The changes were tested and worked as expected. No backups were impacted. This was pushed through to production late on Wednesday 6th September (after 17:30) and other than a lambda issue (using version node.js12 which is no longer available in AWS) it all worked successfully. This will update the backup plan to the new value and removed the old one. This is for production, non-production was always delete after 30 days.
The lambda has since been corrected and now uses node.js 18