community icon indicating copy to clipboard operation
community copied to clipboard

Final Snapshot Creation on RDS DB Deletion

Open cogren1 opened this issue 3 years ago • 17 comments

Is your feature request related to a problem? We lost a database once via a manifest getting deleted. We got lucky in that the data loss was not a major problem, but it would have been a huge problem for other databases we have. As automated backups are deleted with the database, it would be a lot more reassuring if we could set it to create a final snapshot on deletion.

Describe the solution you'd like A flag on the DBCluster and DBInstance manifests that allows an S3 bucket name and key to be specified for the snapshot location. It might make more sense for the key to allow a dictionary, so other options like full or partial snapshot can be specified.

Describe alternatives you've considered I've considered using an Admission Controller to do this work, but I wanted to see where the community stood on this feature as it's been disabled in the code since inception with a comment admitting that not allowing a final snapshot is a bad idea (although I can't seem to find that comment anymore). https://github.com/aws-controllers-k8s/rds-controller/blob/v0.1.1/pkg/resource/db_instance/sdk.go#L2642

cogren1 avatar Oct 06 '22 20:10 cogren1

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Mar 14 '23 01:03 ack-bot

I am running into the same issue here. Accidentally deleting the dbinstance from the cluster also deletes all automated backups, leaving you with no data left.

That absolutely can't be the intended behaviour. The controller should allow you to either create a final snapshot on deletion or at least keep all automated backups of the instance. And that should be the default!

dbaumgarten avatar May 23 '23 13:05 dbaumgarten

with a comment admitting that not allowing a final snapshot is a bad idea (although I can't seem to find that comment anymore).

The comment you refer to @cogren1 is here. :)

That absolutely can't be the intended behaviour.

Actually, it was the intended behaviour. Since we strive to have no side-effects from actions in the controller. Deleting a resource deletes the resource entirely by default. You can use the deletion protection annotations to protect against accidental deletion.

As mentioned in the comment referenced above, I'd be open to adding some custom code to the controller to allow for specifying the final snapshot identifier, though. I think adding an actual field to the Spec for this would be a bit overboard.

How does this sound to you, @cogren1 and @dbaumgarten?

Annotation: rds.services.k8s.aws/final-snapshot-identifier=$IDENTIFIER

If the controller sees this annotation on the DBInstance it will use the $IDENTIFIER value for the finalSnapshotIdentifier Input shape's field for the DeleteDBInstance API call.

jaypipes avatar May 23 '23 13:05 jaypipes

Deletion-protection is what I am currently using. It works for now. But sometimes there is the case that you WANT to delete the instance but still keep it's data. Thats were we are currently missing a good solution.

The option for a final snapshot would be nice. However keeping automated backups has the added benefits that you can still roll-back to a state a few days before the delete.

I have no idea how much effort it is to add additional fields to the spec. But finalSnapshots and keepAutomatedSnapshots seem to be such "basic" features, they should be a clearly visible part of the API and not be hidden behind some obscore annotation.

However, if annotations are the only way to implement this with reasonable effort, I am fine with it.

dbaumgarten avatar May 23 '23 13:05 dbaumgarten

@dbaumgarten I'm not opposed to adding actual fields to the Spec if you think that's a more appropriate solution.

jaypipes avatar May 23 '23 14:05 jaypipes

Yes, I absolutely think that these two settings should have proper fields in the spec

dbaumgarten avatar May 24 '23 13:05 dbaumgarten

Yes, I absolutely think that these two settings should have proper fields in the spec

Roger that @dbaumgarten. I can try to get to this towards the end of this week.

jaypipes avatar May 24 '23 13:05 jaypipes

That sounds great! I tried looing into the code myself, but felt a bit lost. While I do have some experience with controller-runtime, that ACK-specific stuff (runtime, code-gen etc.) is not that simple to understand...

dbaumgarten avatar May 25 '23 06:05 dbaumgarten

I tried looing into the code myself, but felt a bit lost.

Particularly understandable for the rds-controller, which is quite a complex beast of custom code.

RedbackThomson avatar Jun 01 '23 18:06 RedbackThomson

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Nov 28 '23 23:11 ack-bot

The final snapshot option is essential for databases and fully automated deployments. This missing parameter prevents us from performing production deployments with ACK, as we consider it unsafe.

As mentioned, there is a blast radius effect here. If the Kubernetes resource is deleted, the RDS instance and all its automatic snapshots are also deleted. This means companies cannot restore the production system in case of an error in Kubernetes manifest.

In large companies, databases and Kubernetes deployments could be managed by different teams. It could be hard to assume that a company lost all its data on AWS due to a YAML indentation error.

To be fair, protection could be enabled via Retain AWS Resources after CR Deletion, but not limit ACK usages.

Is there any plan to implement the final db snapshot?

vmercierfr avatar Nov 29 '23 06:11 vmercierfr

Stale issues rot after 60d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 60d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

ack-bot avatar Jan 28 '24 07:01 ack-bot

/remove-lifecycle-rotten

This ist still an important issue!

dbaumgarten avatar Jan 28 '24 09:01 dbaumgarten

I'd like to give this a shot.. @dbaumgarten i'm not opposed to injecting a new field into the spec.. However since the controller is GA, we do not want to add features a that could be reverted/renamed in the future (code-generator doesn't support generating mutating webhooks yet). I'd say let's start with an experimental annotation first and decide whether we want to transition them into Spec fields later on.

@dbaumgarten what's the difference you had in mind between finalSnapshots and keepAutomatedSnapshots?

/assign

a-hilaly avatar Jan 28 '24 18:01 a-hilaly

Great! I don't really see a way how this feature could result in backwards-incompatible changes. However I am fine with an annotation.

They both map directly to two parameters of the DeleteDBInstance API-Call: https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DeleteDBInstance.html

DeleteAutomatedBackups: If true (which is the default) all automated snapshots of the DB are delete with the DB SkipFinalSnapshot: If false (default is true), a snapshot is created (end kept) before the DB is deleted

FinalDBSnapshotIdentifier needs to be set when SkipFinalSnapshot is false. I suggest the controller auto-generates an approriate value for it.

It could be debated if it is a good idea to maybe rename the fields DeleteAutomatedBackups -> KeepAutomatedBackups (and invert the logic) and maybe change the default value. However in absense of better arguments I would suggest keeping the Field-Names and default values from the API.

dbaumgarten avatar Jan 29 '24 08:01 dbaumgarten

It will be useful to specify FinalDBSnapshotIdentifier parameter when SkipFinalSnapshot is false, so end-users can choose the final snapshot name.

I agree that AWS RDS API naming is not optimal, but I would recommend to keep the same names/logics. It will be probably easier to implement and document.

vmercierfr avatar Jan 29 '24 13:01 vmercierfr

Having FinalDBSnapshotIdentifier as an optional field is fine. However I would propose to keep it optional and when it's unspecified the controller generates a value based on the DBs identifier and maybe a timestamp.

dbaumgarten avatar Jan 29 '24 13:01 dbaumgarten

Hi folks. I'm happy to see that the design details have been ironed out. This is a very important feature indeed.

Does this sit anywhere on the current roadmap? Do maintainers have bandwidth to support a community contribution here?

cannonpalms avatar Apr 04 '24 14:04 cannonpalms

@cannonpalms i'm currently working on a few feautres for rds-controller and this is one of them, i'll update here as soon as they are merged and new release is made :)

a-hilaly avatar Apr 04 '24 21:04 a-hilaly

Hi folks - this is currently supported in rds v1.4.1 - we have added 3 new annotations to tweak the deletion options of DBInstances and DBClusters. For more details please check the annotations godoc https://github.com/aws-controllers-k8s/rds-controller/blob/main/apis/v1alpha1/annotation.go#L30-L44

I will also add some more documentation about it in our README very soon.

/cc @cannonpalms @dbaumgarten @vmercierfr @cogren1

/close

a-hilaly avatar Jun 12 '24 22:06 a-hilaly

@a-hilaly: Closing this issue.

In response to this:

Hi folks - this is currently supported in rds v1.4.1 - we have added 3 new annotations to tweak the deletion options of DBInstances and DBClusters. For more details please check the annotations godoc https://github.com/aws-controllers-k8s/rds-controller/blob/main/apis/v1alpha1/annotation.go#L30-L44

I will also add some more documentation about it in our README very soon.

/cc @cannonpalms @dbaumgarten @vmercierfr @cogren1

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ack-prow[bot] avatar Jun 12 '24 22:06 ack-prow[bot]

@a-hilaly Thank you very much for adding these features! ❤️

dbaumgarten avatar Jun 13 '24 10:06 dbaumgarten

@a-hilaly Fantastic! Thank you!

cogren1 avatar Jun 13 '24 16:06 cogren1

Thanks, @a-hilaly, for implementing it!

We deployed the v1.4.1 and use this feature to run ACK safely in our environment.

You mentioned it, but documentation has not yet been updated with these annotations, so users should carefully read the Golang code comments.

It was a bit confusing for me to see that when the DeleteAutomatedBackups annotation is not present, automated backups will not be deleted, as this is not the AWS API and AWS CLI default behavior. If AWS users don't see it, it may result in never-deleting automated snapshots and unexpected AWS costs.

vmercierfr avatar Jun 24 '24 16:06 vmercierfr

On the other hand, if dleting backups was the default, users would be surprised that their backups are suddenly gone when deleting a DB. I think suddenly loosing all your data is a worse surprise.

dbaumgarten avatar Jun 25 '24 11:06 dbaumgarten