bottlerocket-update-operator icon indicating copy to clipboard operation
bottlerocket-update-operator copied to clipboard

0.2.0: Cleanup BRS when the operator is removed from a node

Open cbgbt opened this issue 2 years ago • 2 comments

Issue or Feature Request: BRSs currently have ownerReferences to the k8s Node object that the BRS is associated with, meaning that if the Node is deleted, the BRS will be deleted with it. This is great, because it allows the controller to remove the BRS from its ActiveSet.

I foresee an issue though: If a customer removes the brupop label from their node in the middle of an update, the brupop agent/daemonset will be removed from the host, causing updates to cease; however, the BRS in question will not be deleted until the corresponding Node is deleted.

We should create a strategy by which BRSs are cleaned up when the daemonset is removed, or allow the controller to delete BRSs which timeout and seem to become endlessly stuck.

cbgbt avatar Nov 02 '21 17:11 cbgbt

I've added some text about this case to the README for now, so there is at least documentation for assisting customers to not become stuck by this. However, having an automated solution here would be idea.

cbgbt avatar Jan 25 '22 19:01 cbgbt

If the controller keeps a reflector of Nodes, it won't have to do an additional API call to check for label existance.

cbgbt avatar Apr 05 '22 21:04 cbgbt