modernisation-platform icon indicating copy to clipboard operation
modernisation-platform copied to clipboard

Implement automated snapshot lifecycles

Open davidkelliott opened this issue 1 year ago • 1 comments

User Story

As a Modernisation Platform Engineer I want to implement the automated expiration of EBS snapshots So that we strike a balance between maintaining snapshots and keeping down costs

User Type(s)

Modernisation Platform Customer

Value

While we create and maintain snapshots on behalf of customers with AWS Backup, there are other paths for customers to create backups, such as the CreateImage API tool which will create backups with a description like so:

Created by CreateImage(i-1111111111111111) for ami-222222222222

We could provide Modernisation Platform customers the ability to delete snapshots manually, but rather than place the burden on them we think that we should automatically expire snapshots in line with our existing durations for AWS Backup.

Assumptions / Hypothesis / Questions / Unknowns

Definition of done

  • [ ] data lifecycle manager implemented across accounts
  • [ ] old snapshots expired in line with policy
  • [ ] modernisation platform user guide updated

Reference

How to write good user stories Amazon Data Lifecycle Manager

davidkelliott avatar Jun 02 '23 14:06 davidkelliott

later tickets to be created to get people to use it and to clear up any existing old backups

SimonPPledger avatar Jun 13 '23 10:06 SimonPPledger

Steps required.

We need to create a script that does the deleteing.

We need to put the script on git, like something here -

We need to then use the lambda module to create a lambda and use that script.

Also need to make sure we create a user in each account with the correct permissions to run the lambda.

ep-93 avatar Jul 19 '23 09:07 ep-93

https://github.com/ministryofjustice/modernisation-platform/pull/4599

Need to still create a script to do it. I created one, however it didn't remove AWS Backup related snapshots!

ep-93 avatar Jul 20 '23 15:07 ep-93

I was also looking at this and put together the Go lambda code which I sent to the team on slack. Thank you to chatgpt for helping produce that.

SteveLinden avatar Jul 21 '23 15:07 SteveLinden

You can find the code here: https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/blob/main/modules/backup/main.tf

dms1981 avatar Aug 22 '23 10:08 dms1981

Changed this to delete backups over 30 days to 30 days. PR https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/pull/257 raised

SteveLinden avatar Aug 23 '23 14:08 SteveLinden

Completed

SteveLinden avatar Aug 24 '23 10:08 SteveLinden

Now complete. Final PR https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/pull/259

SteveLinden avatar Aug 25 '23 07:08 SteveLinden

Not complete yet, still need a new release and to test and update references to it

davidkelliott avatar Aug 29 '23 09:08 davidkelliott

Had some issues so all of the above has been backed out. It was an issue that appeared to relate to production.

SteveLinden avatar Aug 31 '23 09:08 SteveLinden

Changes tested and they work. For example we removed the backup plans (to test) and for Sprinkler it was run and amended the plan name (no longer 120 days) and this built successfully. I will make the changes in production on Monday 4th when I can change the baselines, create a new release and put this into modernisation platform repo.

SteveLinden avatar Sep 01 '23 14:09 SteveLinden

To confirm a re-apply has been completed on both of the above and it returns no changes and confirms that there are no changes to take place.

SteveLinden avatar Sep 01 '23 14:09 SteveLinden

Release added for this now (https://github.com/ministryofjustice/modernisation-platform-terraform-baselines/pull/271) and I will also add another for modernisation-platform after making a new release for the above.

SteveLinden avatar Sep 04 '23 07:09 SteveLinden

About to test against portal development environment

SteveLinden avatar Sep 06 '23 07:09 SteveLinden

The changes were tested and worked as expected. No backups were impacted. This was pushed through to production late on Wednesday 6th September (after 17:30) and other than a lambda issue (using version node.js12 which is no longer available in AWS) it all worked successfully. This will update the backup plan to the new value and removed the old one. This is for production, non-production was always delete after 30 days.

The lambda has since been corrected and now uses node.js 18

SteveLinden avatar Sep 07 '23 13:09 SteveLinden