bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Create a Delayed Release Process

Open misterek opened this issue 1 year ago • 4 comments

What I'd like: The ability to delay BottleRocket updates, while still having them be automated.

Description: Increasingly, folks are updating nodes automatically. Using a tool like Karpenter watch the SSM Parameters to determine when there is drift, and automatically update the nodes (https://aws.amazon.com/blogs/containers/how-to-upgrade-amazon-eks-worker-nodes-with-karpenter-drift/). BottleRocket Updater does similar. For the most recent release, Karpenter started updating nodes before the GitHub Release was even announced.

Automatically updating is desirable, as it allows companies to stay up to date with little effort. However, often organizations want changes like this to move through different environments. They may want a change to set in dev for a few days, then QA for a few days, before moving to production.

It would be a helpful addition to the BottleRocket Release process if a workflow like that could be supported, while still being automated. This would give organizations a period where they could identify issues with a release, and delay the automated process for production environments if needed.

Potential Implementation: Instead of having a single SSM parameter for "latest release", there could be parameters for:

  • Latest release -- Updated immediately on release
  • Latest minus 3 days -- Updated 3 days after release
  • Latest minus 7 days -- Updated 7 days after release.

This would, I beleive, also require changes in Karpenter or BottleRocket Updater. But would allow us to configure Dev to be latest, QA to be 3 days behind, Prod to be 7 days behind.

Potential Issues: There are plenty of edge cases here. What if a release does have a problem that's found after a day? How does the Release Process handle that? What if there are multiple releases quickly?

Any alternatives you've considered:

This could likely be implemented somehow at the Karpenter/BottleRocket Updater level. And may even be more appropriate there.

Additionally, I believe any organization could also accomplish this by creating their own copies of the AMIs, but I was thinking of something that would be a more generally available process.

misterek avatar Mar 08 '24 14:03 misterek

Thanks for cutting this issue @misterek! I think this is similar to https://github.com/bottlerocket-os/twoliter/issues/497. I think the main difference is providing this as distinct SSM parameters for things like Karpenter vs doing something in the OS. I think there is value in having each issue since they solve the problem differently but end with the same result of having a way to control the velocity of updates.

yeazelm avatar Mar 08 '24 22:03 yeazelm

You are 100% right, it is similar. Apologize for not finding that one.

Probably the core issue is figuring out how to keep track of a history of releases and release dates. I was thinking of that with distinct SSM parameters, but perhaps that would make a better feature for Karpenter (search for AMI's by xyz parameters, and use the one that was most recent as of now - x days). I'm somewhat ambivalent on the implementation, and you folks are probably far better than myself to think through the best way to do this. If you'd like me to, I'd be happy to open an issue over in Karpenter if you think that's more appropriate.

misterek avatar Mar 08 '24 22:03 misterek

There is already a relevant issue opened in Karpenter https://github.com/aws/karpenter-provider-aws/issues/5382, which I think is the right place to solve this, not Bottlerocket SSM params.

mikestef9 avatar Mar 29 '24 00:03 mikestef9

I'm fine with that solution as well. May need a separate solution for bottle rocket updater, but we are moving to Karpenter in general.

misterek avatar Mar 29 '24 13:03 misterek