patterns icon indicating copy to clipboard operation
patterns copied to clipboard

Automatically scale-down non-production Kubernetes clusters when not in use

Open danielvaughan opened this issue 3 years ago • 6 comments

Describe the pattern you'd like to propose One of the great features of the cloud is to be able to spin up and shut down infrastructure on-demand only paying for the time it is running. However, infrastructure is often running 24/7, unnecessarily consuming energy and racking up costs. This is particularly true for Kubernetes clusters that are comprised of multiple virtual machines and typically cost hundreds of dollars per month to run.

Describe specific emission impact from this pattern Only running non-production clusters like those used for development and testing during work hours and not at night or weekends can reduce emissions and costs by 75% which adds up to a considerable saving especially when there are many clusters in use.

Investing in this type of automation can potentially significantly reduce emissions for little or no inconvenience.

References to this pattern AKS on Azure, EKS on AWS and GKE on Google Cloud only charge for the worker node pools for the Kubernetes cluster, not the control plane. This means worker nodes can safely be shown down and then restarted later. This is typically achieved by changing the size of a worker node pool to zero.

All three cloud providers also provide a mechanism to trigger actions on a schedule. This can be used to programmatically notify users and then automatically scale down the worker nodes of a cluster at the end of the day and scale it up again ready for when people come into work.

Additional context Alternatively, clusters can be built and destroyed completely using an automation tool like Terraform and using the same scheduling tool create a new cluster at the start of a day and destroyed it at the end of the day.

A more advanced approach is using third-party tools to destroy or scale down the node pool of a server after an idle period. This may be a better alternative when it is acceptable to wait for the cluster to restart rather than working on a schedule.

danielvaughan avatar Oct 07 '22 15:10 danielvaughan

This is great @danielvaughan and would align to green software patterns we are looking for. Do you want to create a pattern based on the Template in the repo? You can find more specifics in the Guide Section of the patterns website but let me know if you run into any issues.

dubrie avatar Oct 13 '22 02:10 dubrie

Hi @danielvaughan -- It seems that this proposed pattern is covered by pull request https://github.com/Green-Software-Foundation/green-software-patterns/pull/18/files#diff-1b22d201cb4feab7b796bf4cf549e5fbfbe2c09ffad3830eab51d7f7b09874bf

Do you want to add more detail to that pull request or does it cover your proposal?

dubrie avatar Oct 17 '22 22:10 dubrie

Hi @dubrie this looks a bit different. I was proposing scaling down or turning off non-production clusters when not in use. The pull request looks more about scaling down non-critical workloads.

danielvaughan avatar Oct 18 '22 14:10 danielvaughan

This is great @danielvaughan and would align to green software patterns we are looking for. Do you want to create a pattern based on the Template in the repo? You can find more specifics in the Guide Section of the patterns website but let me know if you run into any issues.

HI @dubrie to be honest, I am struggling a bit with the instructions and the template especially how to apply the SCI. It would be great if the submission guide had a bit more information and maybe some examples. At the moment it seems a little too academic for consumption by the average submitter including myself even when I have the spec in front of me.

danielvaughan avatar Oct 18 '22 14:10 danielvaughan

From my POV the pattern that @dubrie referenced and this one are largely overlapping in the resulting technique. I also think that the pattern of https://github.com/Green-Software-Foundation/green-software-patterns/pull/18/files#diff-1b22d201cb4feab7b796bf4cf549e5fbfbe2c09ffad3830eab51d7f7b09874bf could be worded better, for example by removing the "=>" and converting it to a full sentence.

My suggestion @danielvaughan : Would it be a solution if you do the following:

  • Create a new branch with the file https://raw.githubusercontent.com/Green-Software-Foundation/green-software-patterns/a3e86aa5a588a17e2750092032c9d088ec7d9969/docs/catalog/cloud/scale-down-kubernetes-workloads.md
  • Update everything except the SCI Impact
  • Create a new PR

Then we can help you with the SCI Impact part. Would this be a way to go forward for you? @danielvaughan

markus-gsf-seidl avatar Oct 21 '22 04:10 markus-gsf-seidl

I would argue that we have already much in that regard, leaving this open for vote if this provides a new perspective over these existing patterns:

  • #141
  • #140
  • https://patterns.greensoftware.foundation/catalog/cloud/time-shift-kubernetes-cron-jobs/
  • https://patterns.greensoftware.foundation/catalog/cloud/scale-down-kubernetes-workloads
  • https://patterns.greensoftware.foundation/catalog/cloud/scale-kubernetes-workloads-based-on-events

markus-gsf-seidl avatar Jan 17 '23 09:01 markus-gsf-seidl